From 2cbd866dd858ceebe2b735e3d8cd8eb702ca1f88 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stefan=20Br=C3=BCns?= Date: Fri, 3 Nov 2017 03:01:53 +0100 Subject: [PATCH 0001/1013] platform/x86: hp-wmi: Fix tablet mode detection for convertibles MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 9968e12a291e639dd51d1218b694d440b22a917f upstream. Commit f9cf3b2880cc ("platform/x86: hp-wmi: Refactor dock and tablet state fetchers") consolidated the methods for docking and laptop mode detection, but omitted to apply the correct mask for the laptop mode (it always uses the constant for docking). Fixes: f9cf3b2880cc ("platform/x86: hp-wmi: Refactor dock and tablet state fetchers") Signed-off-by: Stefan Brüns Cc: Michel Dänzer Signed-off-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman --- drivers/platform/x86/hp-wmi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/platform/x86/hp-wmi.c b/drivers/platform/x86/hp-wmi.c index b4ed3dc983d5..b4224389febe 100644 --- a/drivers/platform/x86/hp-wmi.c +++ b/drivers/platform/x86/hp-wmi.c @@ -297,7 +297,7 @@ static int hp_wmi_hw_state(int mask) if (state < 0) return state; - return state & 0x1; + return !!(state & mask); } static int __init hp_wmi_bios_2008_later(void) From b8d0c953d928fbdc3207f741a2b77055d34fd0b9 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Wed, 29 Nov 2017 16:09:54 -0800 Subject: [PATCH 0002/1013] mm, memory_hotplug: do not back off draining pcp free pages from kworker context commit 4b81cb2ff69c8a8e297a147d2eb4d9b5e8d7c435 upstream. drain_all_pages backs off when called from a kworker context since commit 0ccce3b92421 ("mm, page_alloc: drain per-cpu pages from workqueue context") because the original IPI based pcp draining has been replaced by a WQ based one and the check wanted to prevent from recursion and inter workers dependencies. This has made some sense at the time because the system WQ has been used and one worker holding the lock could be blocked while waiting for new workers to emerge which can be a problem under OOM conditions. Since then commit ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq") has moved draining to a dedicated (mm_percpu_wq) WQ with a rescuer so we shouldn't depend on any other WQ activity to make a forward progress so calling drain_all_pages from a worker context is safe as long as this doesn't happen from mm_percpu_wq itself which is not the case because all workers are required to _not_ depend on any MM locks. Why is this a problem in the first place? ACPI driven memory hot-remove (acpi_device_hotplug) is executed from the worker context. We end up calling __offline_pages to free all the pages and that requires both lru_add_drain_all_cpuslocked and drain_all_pages to do their job otherwise we can have dangling pages on pcp lists and fail the offline operation (__test_page_isolated_in_pageblock would see a page with 0 ref count but without PageBuddy set). Fix the issue by removing the worker check in drain_all_pages. lru_add_drain_all_cpuslocked doesn't have this restriction so it works as expected. Link: http://lkml.kernel.org/r/20170828093341.26341-1-mhocko@kernel.org Fixes: 0ccce3b924212 ("mm, page_alloc: drain per-cpu pages from workqueue context") Signed-off-by: Michal Hocko Cc: Mel Gorman Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/page_alloc.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 82a6270c9743..286de4cee452 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2487,10 +2487,6 @@ void drain_all_pages(struct zone *zone) if (WARN_ON_ONCE(!mm_percpu_wq)) return; - /* Workqueues cannot recurse */ - if (current->flags & PF_WQ_WORKER) - return; - /* * Do not drain if one is already in progress unless it's specific to * a zone. Such callers are primarily CMA and memory hotplug and need From 786b924d39bad16ff99aacdb4076df027cc2f8b8 Mon Sep 17 00:00:00 2001 From: Wang Nan Date: Wed, 29 Nov 2017 16:09:58 -0800 Subject: [PATCH 0003/1013] mm, oom_reaper: gather each vma to prevent leaking TLB entry commit 687cb0884a714ff484d038e9190edc874edcf146 upstream. tlb_gather_mmu(&tlb, mm, 0, -1) means gathering the whole virtual memory space. In this case, tlb->fullmm is true. Some archs like arm64 doesn't flush TLB when tlb->fullmm is true: commit 5a7862e83000 ("arm64: tlbflush: avoid flushing when fullmm == 1"). Which causes leaking of tlb entries. Will clarifies his patch: "Basically, we tag each address space with an ASID (PCID on x86) which is resident in the TLB. This means we can elide TLB invalidation when pulling down a full mm because we won't ever assign that ASID to another mm without doing TLB invalidation elsewhere (which actually just nukes the whole TLB). I think that means that we could potentially not fault on a kernel uaccess, because we could hit in the TLB" There could be a window between complete_signal() sending IPI to other cores and all threads sharing this mm are really kicked off from cores. In this window, the oom reaper may calls tlb_flush_mmu_tlbonly() to flush TLB then frees pages. However, due to the above problem, the TLB entries are not really flushed on arm64. Other threads are possible to access these pages through TLB entries. Moreover, a copy_to_user() can also write to these pages without generating page fault, causes use-after-free bugs. This patch gathers each vma instead of gathering full vm space. In this case tlb->fullmm is not true. The behavior of oom reaper become similar to munmapping before do_exit, which should be safe for all archs. Link: http://lkml.kernel.org/r/20171107095453.179940-1-wangnan0@huawei.com Fixes: aac453635549 ("mm, oom: introduce oom reaper") Signed-off-by: Wang Nan Acked-by: Michal Hocko Acked-by: David Rientjes Cc: Minchan Kim Cc: Will Deacon Cc: Bob Liu Cc: Ingo Molnar Cc: Roman Gushchin Cc: Konstantin Khlebnikov Cc: Andrea Arcangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/oom_kill.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index dee0f75c3013..18c5b356b505 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -532,7 +532,6 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) */ set_bit(MMF_UNSTABLE, &mm->flags); - tlb_gather_mmu(&tlb, mm, 0, -1); for (vma = mm->mmap ; vma; vma = vma->vm_next) { if (!can_madv_dontneed_vma(vma)) continue; @@ -547,11 +546,13 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, struct mm_struct *mm) * we do not want to block exit_mmap by keeping mm ref * count elevated without a good reason. */ - if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) + if (vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED)) { + tlb_gather_mmu(&tlb, mm, vma->vm_start, vma->vm_end); unmap_page_range(&tlb, vma, vma->vm_start, vma->vm_end, NULL); + tlb_finish_mmu(&tlb, vma->vm_start, vma->vm_end); + } } - tlb_finish_mmu(&tlb, 0, -1); pr_info("oom_reaper: reaped process %d (%s), now anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n", task_pid_nr(tsk), tsk->comm, K(get_mm_counter(mm, MM_ANONPAGES)), From 01ca9727457a167463a47e35b6fe5a5173b4e341 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Mon, 27 Nov 2017 06:21:25 +0300 Subject: [PATCH 0004/1013] mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d() commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream. Currently, we unconditionally make page table dirty in touch_pmd(). It may result in false-positive can_follow_write_pmd(). We may avoid the situation, if we would only make the page table entry dirty if caller asks for write access -- FOLL_WRITE. The patch also changes touch_pud() in the same way. Signed-off-by: Kirill A. Shutemov Cc: Michal Hocko Cc: Hugh Dickins Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/huge_memory.c | 36 +++++++++++++----------------------- 1 file changed, 13 insertions(+), 23 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1981ed697dab..eba34cdfc3e5 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -842,20 +842,15 @@ EXPORT_SYMBOL_GPL(vmf_insert_pfn_pud); #endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ static void touch_pmd(struct vm_area_struct *vma, unsigned long addr, - pmd_t *pmd) + pmd_t *pmd, int flags) { pmd_t _pmd; - /* - * We should set the dirty bit only for FOLL_WRITE but for now - * the dirty bit in the pmd is meaningless. And if the dirty - * bit will become meaningful and we'll only set it with - * FOLL_WRITE, an atomic set_bit will be required on the pmd to - * set the young bit, instead of the current set_pmd_at. - */ - _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); + _pmd = pmd_mkyoung(*pmd); + if (flags & FOLL_WRITE) + _pmd = pmd_mkdirty(_pmd); if (pmdp_set_access_flags(vma, addr & HPAGE_PMD_MASK, - pmd, _pmd, 1)) + pmd, _pmd, flags & FOLL_WRITE)) update_mmu_cache_pmd(vma, addr, pmd); } @@ -884,7 +879,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr, return NULL; if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd); + touch_pmd(vma, addr, pmd, flags); /* * device mapped pages can only be returned if the @@ -995,20 +990,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm, #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD static void touch_pud(struct vm_area_struct *vma, unsigned long addr, - pud_t *pud) + pud_t *pud, int flags) { pud_t _pud; - /* - * We should set the dirty bit only for FOLL_WRITE but for now - * the dirty bit in the pud is meaningless. And if the dirty - * bit will become meaningful and we'll only set it with - * FOLL_WRITE, an atomic set_bit will be required on the pud to - * set the young bit, instead of the current set_pud_at. - */ - _pud = pud_mkyoung(pud_mkdirty(*pud)); + _pud = pud_mkyoung(*pud); + if (flags & FOLL_WRITE) + _pud = pud_mkdirty(_pud); if (pudp_set_access_flags(vma, addr & HPAGE_PUD_MASK, - pud, _pud, 1)) + pud, _pud, flags & FOLL_WRITE)) update_mmu_cache_pud(vma, addr, pud); } @@ -1031,7 +1021,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr, return NULL; if (flags & FOLL_TOUCH) - touch_pud(vma, addr, pud); + touch_pud(vma, addr, pud, flags); /* * device mapped pages can only be returned if the @@ -1407,7 +1397,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, page = pmd_page(*pmd); VM_BUG_ON_PAGE(!PageHead(page) && !is_zone_device_page(page), page); if (flags & FOLL_TOUCH) - touch_pmd(vma, addr, pmd); + touch_pmd(vma, addr, pmd, flags); if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* * We don't mlock() pte-mapped THPs. This way we can avoid From b4c8fce668423c0dde23fe57682e5bd8df5bbeb9 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Wed, 29 Nov 2017 16:10:01 -0800 Subject: [PATCH 0005/1013] mm/cma: fix alloc_contig_range ret code/potential leak commit 63cd448908b5eb51d84c52f02b31b9b4ccd1cb5a upstream. If the call __alloc_contig_migrate_range() in alloc_contig_range returns -EBUSY, processing continues so that test_pages_isolated() is called where there is a tracepoint to identify the busy pages. However, it is possible for busy pages to become available between the calls to these two routines. In this case, the range of pages may be allocated. Unfortunately, the original return code (ret == -EBUSY) is still set and returned to the caller. Therefore, the caller believes the pages were not allocated and they are leaked. Update the comment to indicate that allocation is still possible even if __alloc_contig_migrate_range returns -EBUSY. Also, clear return code in this case so that it is not accidentally used or returned to caller. Link: http://lkml.kernel.org/r/20171122185214.25285-1-mike.kravetz@oracle.com Fixes: 8ef5849fa8a2 ("mm/cma: always check which page caused allocation failure") Signed-off-by: Mike Kravetz Acked-by: Vlastimil Babka Acked-by: Michal Hocko Acked-by: Johannes Weiner Acked-by: Joonsoo Kim Cc: Michal Nazarewicz Cc: Laura Abbott Cc: Michal Hocko Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/page_alloc.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 286de4cee452..d51c2087c498 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7587,11 +7587,18 @@ int alloc_contig_range(unsigned long start, unsigned long end, /* * In case of -EBUSY, we'd like to know which page causes problem. - * So, just fall through. We will check it in test_pages_isolated(). + * So, just fall through. test_pages_isolated() has a tracepoint + * which will report the busy page. + * + * It is possible that busy pages could become available before + * the call to test_pages_isolated, and the range will actually be + * allocated. So, if we fall through be sure to clear ret so that + * -EBUSY is not accidentally used or returned to caller. */ ret = __alloc_contig_migrate_range(&cc, start, end); if (ret && ret != -EBUSY) goto done; + ret =0; /* * Pages from [start, end) are within a MAX_ORDER_NR_PAGES From bf55918cb4fb374000ad1d063b207a17a93e8726 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 29 Nov 2017 16:10:06 -0800 Subject: [PATCH 0006/1013] mm: fix device-dax pud write-faults triggered by get_user_pages() commit 1501899a898dfb5477c55534bdfd734c046da06d upstream. Currently only get_user_pages_fast() can safely handle the writable gup case due to its use of pud_access_permitted() to check whether the pud entry is writable. In the gup slow path pud_write() is used instead of pud_access_permitted() and to date it has been unimplemented, just calls BUG_ON(). kernel BUG at ./include/linux/hugetlb.h:244! [..] RIP: 0010:follow_devmap_pud+0x482/0x490 [..] Call Trace: follow_page_mask+0x28c/0x6e0 __get_user_pages+0xe4/0x6c0 get_user_pages_unlocked+0x130/0x1b0 get_user_pages_fast+0x89/0xb0 iov_iter_get_pages_alloc+0x114/0x4a0 nfs_direct_read_schedule_iovec+0xd2/0x350 ? nfs_start_io_direct+0x63/0x70 nfs_file_direct_read+0x1e0/0x250 nfs_file_read+0x90/0xc0 For now this just implements a simple check for the _PAGE_RW bit similar to pmd_write. However, this implies that the gup-slow-path check is missing the extra checks that the gup-fast-path performs with pud_access_permitted. Later patches will align all checks to use the 'access_permitted' helper if the architecture provides it. Note that the generic 'access_permitted' helper fallback is the simple _PAGE_RW check on architectures that do not define the 'access_permitted' helper(s). [dan.j.williams@intel.com: fix powerpc compile error] Link: http://lkml.kernel.org/r/151129126165.37405.16031785266675461397.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151043109938.2842.14834662818213616199.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: a00cc7d9dd93 ("mm, x86: add support for PUD-sized transparent hugepages") Signed-off-by: Dan Williams Reported-by: Stephen Rothwell Acked-by: Thomas Gleixner [x86] Cc: Kirill A. Shutemov Cc: Catalin Marinas Cc: "David S. Miller" Cc: Dave Hansen Cc: Will Deacon Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Arnd Bergmann Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 6 ++++++ include/asm-generic/pgtable.h | 8 ++++++++ include/linux/hugetlb.h | 8 -------- 3 files changed, 14 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f735c3016325..f02de8bc1f72 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1093,6 +1093,12 @@ static inline void pmdp_set_wrprotect(struct mm_struct *mm, clear_bit(_PAGE_BIT_RW, (unsigned long *)pmdp); } +#define pud_write pud_write +static inline int pud_write(pud_t pud) +{ + return pud_flags(pud) & _PAGE_RW; +} + /* * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); * diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 757dc6ffc7ba..1ac457511f4e 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -814,6 +814,14 @@ static inline int pmd_write(pmd_t pmd) #endif /* __HAVE_ARCH_PMD_WRITE */ #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#ifndef pud_write +static inline int pud_write(pud_t pud) +{ + BUG(); + return 0; +} +#endif /* pud_write */ + #if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \ (defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ !defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index fbf5b31d47ee..82a25880714a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -239,14 +239,6 @@ static inline int pgd_write(pgd_t pgd) } #endif -#ifndef pud_write -static inline int pud_write(pud_t pud) -{ - BUG(); - return 0; -} -#endif - #define HUGETLB_ANON_FILE "anon_hugepage" enum { From c6c78a1d4561880c744d3ec932264b264a879e7c Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 29 Nov 2017 16:10:28 -0800 Subject: [PATCH 0007/1013] mm, hugetlbfs: introduce ->split() to vm_operations_struct commit 31383c6865a578834dd953d9dbc88e6b19fe3997 upstream. Patch series "device-dax: fix unaligned munmap handling" When device-dax is operating in huge-page mode we want it to behave like hugetlbfs and fail attempts to split vmas into unaligned ranges. It would be messy to teach the munmap path about device-dax alignment constraints in the same (hstate) way that hugetlbfs communicates this constraint. Instead, these patches introduce a new ->split() vm operation. This patch (of 2): The device-dax interface has similar constraints as hugetlbfs in that it requires the munmap path to unmap in huge page aligned units. Rather than add more custom vma handling code in __split_vma() introduce a new vm operation to perform this vma specific check. Link: http://lkml.kernel.org/r/151130418135.4029.6783191281930729710.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams Cc: Jeff Moyer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/mm.h | 1 + mm/hugetlb.c | 8 ++++++++ mm/mmap.c | 8 +++++--- 3 files changed, 14 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 43edf659453b..c96a8b769e3d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -367,6 +367,7 @@ enum page_entry_size { struct vm_operations_struct { void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area); + int (*split)(struct vm_area_struct * area, unsigned long addr); int (*mremap)(struct vm_area_struct * area); int (*fault)(struct vm_fault *vmf); int (*huge_fault)(struct vm_fault *vmf, enum page_entry_size pe_size); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2d2ff5e8bf2b..a2233d722ff9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3125,6 +3125,13 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) } } +static int hugetlb_vm_op_split(struct vm_area_struct *vma, unsigned long addr) +{ + if (addr & ~(huge_page_mask(hstate_vma(vma)))) + return -EINVAL; + return 0; +} + /* * We cannot handle pagefaults against hugetlb pages at all. They cause * handle_mm_fault() to try to instantiate regular-sized pages in the @@ -3141,6 +3148,7 @@ const struct vm_operations_struct hugetlb_vm_ops = { .fault = hugetlb_vm_op_fault, .open = hugetlb_vm_op_open, .close = hugetlb_vm_op_close, + .split = hugetlb_vm_op_split, }; static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, diff --git a/mm/mmap.c b/mm/mmap.c index 680506faceae..476e810cf100 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2540,9 +2540,11 @@ int __split_vma(struct mm_struct *mm, struct vm_area_struct *vma, struct vm_area_struct *new; int err; - if (is_vm_hugetlb_page(vma) && (addr & - ~(huge_page_mask(hstate_vma(vma))))) - return -EINVAL; + if (vma->vm_ops && vma->vm_ops->split) { + err = vma->vm_ops->split(vma, addr); + if (err) + return err; + } new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); if (!new) From 61586a2286140935fe711b3410c53ee1141633b2 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 29 Nov 2017 16:10:32 -0800 Subject: [PATCH 0008/1013] device-dax: implement ->split() to catch invalid munmap attempts commit 9702cffdbf2129516db679e4467db81e1cd287da upstream. Similar to how device-dax enforces that the 'address', 'offset', and 'len' parameters to mmap() be aligned to the device's fundamental alignment, the same constraints apply to munmap(). Implement ->split() to fail munmap calls that violate the alignment constraint. Otherwise, we later fail VM_BUG_ON checks in the unmap_page_range() path with crash signatures of the form: vma ffff8800b60c8a88 start 00007f88c0000000 end 00007f88c0e00000 next (null) prev (null) mm ffff8800b61150c0 prot 8000000000000027 anon_vma (null) vm_ops ffffffffa0091240 pgoff 0 file ffff8800b638ef80 private_data (null) flags: 0x380000fb(read|write|shared|mayread|maywrite|mayexec|mayshare|softdirty|mixedmap|hugepage) ------------[ cut here ]------------ kernel BUG at mm/huge_memory.c:2014! [..] RIP: 0010:__split_huge_pud+0x12a/0x180 [..] Call Trace: unmap_page_range+0x245/0xa40 ? __vma_adjust+0x301/0x990 unmap_vmas+0x4c/0xa0 unmap_region+0xae/0x120 ? __vma_rb_erase+0x11a/0x230 do_munmap+0x276/0x410 vm_munmap+0x6a/0xa0 SyS_munmap+0x1d/0x30 Link: http://lkml.kernel.org/r/151130418681.4029.7118245855057952010.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: dee410792419 ("/dev/dax, core: file operations and dax-mmap") Signed-off-by: Dan Williams Reported-by: Jeff Moyer Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- drivers/dax/device.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index e9f3b3e4bbf4..375b99bca002 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -427,9 +427,21 @@ static int dev_dax_fault(struct vm_fault *vmf) return dev_dax_huge_fault(vmf, PE_SIZE_PTE); } +static int dev_dax_split(struct vm_area_struct *vma, unsigned long addr) +{ + struct file *filp = vma->vm_file; + struct dev_dax *dev_dax = filp->private_data; + struct dax_region *dax_region = dev_dax->region; + + if (!IS_ALIGNED(addr, dax_region->align)) + return -EINVAL; + return 0; +} + static const struct vm_operations_struct dax_vm_ops = { .fault = dev_dax_fault, .huge_fault = dev_dax_huge_fault, + .split = dev_dax_split, }; static int dax_mmap(struct file *filp, struct vm_area_struct *vma) From 40aa9d2998ca55d9011e75a0fa526ed52e0a4bce Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 29 Nov 2017 16:10:35 -0800 Subject: [PATCH 0009/1013] mm: introduce get_user_pages_longterm commit 2bb6d2837083de722bfdc369cb0d76ce188dd9b4 upstream. Patch series "introduce get_user_pages_longterm()", v2. Here is a new get_user_pages api for cases where a driver intends to keep an elevated page count indefinitely. This is distinct from usages like iov_iter_get_pages where the elevated page counts are transient. The iov_iter_get_pages cases immediately turn around and submit the pages to a device driver which will put_page when the i/o operation completes (under kernel control). In the longterm case userspace is responsible for dropping the page reference at some undefined point in the future. This is untenable for filesystem-dax case where the filesystem is in control of the lifetime of the block / page and needs reasonable limits on how long it can wait for pages in a mapping to become idle. Fixing filesystems to actually wait for dax pages to be idle before blocks from a truncate/hole-punch operation are repurposed is saved for a later patch series. Also, allowing longterm registration of dax mappings is a future patch series that introduces a "map with lease" semantic where the kernel can revoke a lease and force userspace to drop its page references. I have also tagged these for -stable to purposely break cases that might assume that longterm memory registrations for filesystem-dax mappings were supported by the kernel. The behavior regression this policy change implies is one of the reasons we maintain the "dax enabled. Warning: EXPERIMENTAL, use at your own risk" notification when mounting a filesystem in dax mode. It is worth noting the device-dax interface does not suffer the same constraints since it does not support file space management operations like hole-punch. This patch (of 4): Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow long standing memory registrations against filesytem-dax vmas. Device-dax vmas do not have this problem and are explicitly allowed. This is temporary until a "memory registration with layout-lease" mechanism can be implemented for the affected sub-systems (RDMA and V4L2). [akpm@linux-foundation.org: use kcalloc()] Link: http://lkml.kernel.org/r/151068939435.7446.13560129395419350737.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams Suggested-by: Christoph Hellwig Cc: Doug Ledford Cc: Hal Rosenstock Cc: Inki Dae Cc: Jan Kara Cc: Jason Gunthorpe Cc: Jeff Moyer Cc: Joonyoung Shim Cc: Kyungmin Park Cc: Mauro Carvalho Chehab Cc: Mel Gorman Cc: Ross Zwisler Cc: Sean Hefty Cc: Seung-Woo Kim Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/fs.h | 14 ++++++++++ include/linux/mm.h | 13 ++++++++++ mm/gup.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 91 insertions(+) diff --git a/include/linux/fs.h b/include/linux/fs.h index 885266aae2d7..9c6c51b7a644 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3175,6 +3175,20 @@ static inline bool vma_is_dax(struct vm_area_struct *vma) return vma->vm_file && IS_DAX(vma->vm_file->f_mapping->host); } +static inline bool vma_is_fsdax(struct vm_area_struct *vma) +{ + struct inode *inode; + + if (!vma->vm_file) + return false; + if (!vma_is_dax(vma)) + return false; + inode = file_inode(vma->vm_file); + if (inode->i_mode == S_IFCHR) + return false; /* device-dax */ + return true; +} + static inline int iocb_flags(struct file *file) { int res = 0; diff --git a/include/linux/mm.h b/include/linux/mm.h index c96a8b769e3d..db647d428100 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1368,6 +1368,19 @@ long get_user_pages_locked(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); +#ifdef CONFIG_FS_DAX +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas); +#else +static inline long get_user_pages_longterm(unsigned long start, + unsigned long nr_pages, unsigned int gup_flags, + struct page **pages, struct vm_area_struct **vmas) +{ + return get_user_pages(start, nr_pages, gup_flags, pages, vmas); +} +#endif /* CONFIG_FS_DAX */ + int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages); diff --git a/mm/gup.c b/mm/gup.c index b2b4d4263768..165ba2174c75 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1095,6 +1095,70 @@ long get_user_pages(unsigned long start, unsigned long nr_pages, } EXPORT_SYMBOL(get_user_pages); +#ifdef CONFIG_FS_DAX +/* + * This is the same as get_user_pages() in that it assumes we are + * operating on the current task's mm, but it goes further to validate + * that the vmas associated with the address range are suitable for + * longterm elevated page reference counts. For example, filesystem-dax + * mappings are subject to the lifetime enforced by the filesystem and + * we need guarantees that longterm users like RDMA and V4L2 only + * establish mappings that have a kernel enforced revocation mechanism. + * + * "longterm" == userspace controlled elevated page count lifetime. + * Contrast this to iov_iter_get_pages() usages which are transient. + */ +long get_user_pages_longterm(unsigned long start, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vm_area_struct **vmas_arg) +{ + struct vm_area_struct **vmas = vmas_arg; + struct vm_area_struct *vma_prev = NULL; + long rc, i; + + if (!pages) + return -EINVAL; + + if (!vmas) { + vmas = kcalloc(nr_pages, sizeof(struct vm_area_struct *), + GFP_KERNEL); + if (!vmas) + return -ENOMEM; + } + + rc = get_user_pages(start, nr_pages, gup_flags, pages, vmas); + + for (i = 0; i < rc; i++) { + struct vm_area_struct *vma = vmas[i]; + + if (vma == vma_prev) + continue; + + vma_prev = vma; + + if (vma_is_fsdax(vma)) + break; + } + + /* + * Either get_user_pages() failed, or the vma validation + * succeeded, in either case we don't need to put_page() before + * returning. + */ + if (i >= rc) + goto out; + + for (i = 0; i < rc; i++) + put_page(pages[i]); + rc = -EOPNOTSUPP; +out: + if (vmas != vmas_arg) + kfree(vmas); + return rc; +} +EXPORT_SYMBOL(get_user_pages_longterm); +#endif /* CONFIG_FS_DAX */ + /** * populate_vma_page_range() - populate a range of pages in the vma. * @vma: target vma From daac045736edf0e4ec21d7b39b3633537246f9c8 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 29 Nov 2017 16:10:39 -0800 Subject: [PATCH 0010/1013] mm: fail get_vaddr_frames() for filesystem-dax mappings commit b7f0554a56f21fb3e636a627450a9add030889be upstream. Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow V4L2, Exynos, and other frame vector users to create long standing / irrevocable memory registrations against filesytem-dax vmas. [dan.j.williams@intel.com: add comment for vma_is_fsdax() check in get_vaddr_frames(), per Jan] Link: http://lkml.kernel.org/r/151197874035.26211.4061781453123083667.stgit@dwillia2-desk3.amr.corp.intel.com Link: http://lkml.kernel.org/r/151068939985.7446.15684639617389154187.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams Reviewed-by: Jan Kara Cc: Inki Dae Cc: Seung-Woo Kim Cc: Joonyoung Shim Cc: Kyungmin Park Cc: Mauro Carvalho Chehab Cc: Mel Gorman Cc: Vlastimil Babka Cc: Christoph Hellwig Cc: Doug Ledford Cc: Hal Rosenstock Cc: Jason Gunthorpe Cc: Jeff Moyer Cc: Ross Zwisler Cc: Sean Hefty Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/frame_vector.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/mm/frame_vector.c b/mm/frame_vector.c index 2f98df0d460e..297c7238f7d4 100644 --- a/mm/frame_vector.c +++ b/mm/frame_vector.c @@ -53,6 +53,18 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames, ret = -EFAULT; goto out; } + + /* + * While get_vaddr_frames() could be used for transient (kernel + * controlled lifetime) pinning of memory pages all current + * users establish long term (userspace controlled lifetime) + * page pinning. Treat get_vaddr_frames() like + * get_user_pages_longterm() and disallow it for filesystem-dax + * mappings. + */ + if (vma_is_fsdax(vma)) + return -EOPNOTSUPP; + if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) { vec->got_ref = true; vec->is_pfns = false; From e13ee3368ee1cbf4d2a9a70af20cdf519fd8bb3c Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 29 Nov 2017 16:10:43 -0800 Subject: [PATCH 0011/1013] v4l2: disable filesystem-dax mapping support commit b70131de648c2b997d22f4653934438013f407a1 upstream. V4L2 memory registrations are incompatible with filesystem-dax that needs the ability to revoke dma access to a mapping at will, or otherwise allow the kernel to wait for completion of DMA. The filesystem-dax implementation breaks the traditional solution of truncate of active file backed mappings since there is no page-cache page we can orphan to sustain ongoing DMA. If v4l2 wants to support long lived DMA mappings it needs to arrange to hold a file lease or use some other mechanism so that the kernel can coordinate revoking DMA access when the filesystem needs to truncate mappings. Link: http://lkml.kernel.org/r/151068940499.7446.12846708245365671207.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams Reported-by: Jan Kara Reviewed-by: Jan Kara Cc: Mauro Carvalho Chehab Cc: Christoph Hellwig Cc: Doug Ledford Cc: Hal Rosenstock Cc: Inki Dae Cc: Jason Gunthorpe Cc: Jeff Moyer Cc: Joonyoung Shim Cc: Kyungmin Park Cc: Mel Gorman Cc: Ross Zwisler Cc: Sean Hefty Cc: Seung-Woo Kim Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- drivers/media/v4l2-core/videobuf-dma-sg.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c index 0b5c43f7e020..f412429cf5ba 100644 --- a/drivers/media/v4l2-core/videobuf-dma-sg.c +++ b/drivers/media/v4l2-core/videobuf-dma-sg.c @@ -185,12 +185,13 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma, dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n", data, size, dma->nr_pages); - err = get_user_pages(data & PAGE_MASK, dma->nr_pages, + err = get_user_pages_longterm(data & PAGE_MASK, dma->nr_pages, flags, dma->pages, NULL); if (err != dma->nr_pages) { dma->nr_pages = (err >= 0) ? err : 0; - dprintk(1, "get_user_pages: err=%d [%d]\n", err, dma->nr_pages); + dprintk(1, "get_user_pages_longterm: err=%d [%d]\n", err, + dma->nr_pages); return err < 0 ? err : -EINVAL; } return 0; From aab681eb8f3bfe2950c8493f39cef6ae34b731b3 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 29 Nov 2017 16:10:47 -0800 Subject: [PATCH 0012/1013] IB/core: disable memory registration of filesystem-dax vmas commit 5f1d43de54164dcfb9bfa542fcc92c1e1a1b6c1d upstream. Until there is a solution to the dma-to-dax vs truncate problem it is not safe to allow RDMA to create long standing memory registrations against filesytem-dax vmas. Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com Fixes: 3565fce3a659 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: Dan Williams Reported-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Acked-by: Jason Gunthorpe Acked-by: Doug Ledford Cc: Sean Hefty Cc: Hal Rosenstock Cc: Jeff Moyer Cc: Ross Zwisler Cc: Inki Dae Cc: Jan Kara Cc: Joonyoung Shim Cc: Kyungmin Park Cc: Mauro Carvalho Chehab Cc: Mel Gorman Cc: Seung-Woo Kim Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/umem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 21e60b1e2ff4..130606c3b07c 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -191,7 +191,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, sg_list_start = umem->sg_head.sgl; while (npages) { - ret = get_user_pages(cur_base, + ret = get_user_pages_longterm(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), gup_flags, page_list, vma_list); From c16a65582a72d3adea145b9b0a737c6c35ce89fc Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 29 Nov 2017 16:10:51 -0800 Subject: [PATCH 0013/1013] exec: avoid RLIMIT_STACK races with prlimit() commit 04e35f4495dd560db30c25efca4eecae8ec8c375 upstream. While the defense-in-depth RLIMIT_STACK limit on setuid processes was protected against races from other threads calling setrlimit(), I missed protecting it against races from external processes calling prlimit(). This adds locking around the change and makes sure that rlim_max is set too. Link: http://lkml.kernel.org/r/20171127193457.GA11348@beast Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec") Signed-off-by: Kees Cook Reported-by: Ben Hutchings Reported-by: Brad Spengler Acked-by: Serge Hallyn Cc: James Morris Cc: Andy Lutomirski Cc: Oleg Nesterov Cc: Jiri Slaby Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/exec.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/exec.c b/fs/exec.c index 3e14ba25f678..4726c777dd38 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1340,10 +1340,15 @@ void setup_new_exec(struct linux_binprm * bprm) * avoid bad behavior from the prior rlimits. This has to * happen before arch_pick_mmap_layout(), which examines * RLIMIT_STACK, but after the point of no return to avoid - * needing to clean up the change on failure. + * races from other threads changing the limits. This also + * must be protected from races with prlimit() calls. */ + task_lock(current->group_leader); if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM) current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM; + if (current->signal->rlim[RLIMIT_STACK].rlim_max > _STK_LIM) + current->signal->rlim[RLIMIT_STACK].rlim_max = _STK_LIM; + task_unlock(current->group_leader); } arch_pick_mmap_layout(current->mm); From 8a0bb9ebaa8b8faee61f095757662fe5d7fd8da6 Mon Sep 17 00:00:00 2001 From: chenjie Date: Wed, 29 Nov 2017 16:10:54 -0800 Subject: [PATCH 0014/1013] mm/madvise.c: fix madvise() infinite loop under special circumstances commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream. MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings. Unfortunately madvise_willneed() doesn't communicate this information properly to the generic madvise syscall implementation. The calling convention is quite subtle there. madvise_vma() is supposed to either return an error or update &prev otherwise the main loop will never advance to the next vma and it will keep looping for ever without a way to get out of the kernel. It seems this has been broken since introduction. Nobody has noticed because nobody seems to be using MADVISE_WILLNEED on these DAX mappings. [mhocko@suse.com: rewrite changelog] Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com Fixes: fe77ba6f4f97 ("[PATCH] xip: madvice/fadvice: execute in place") Signed-off-by: chenjie Signed-off-by: guoxuenan Acked-by: Michal Hocko Cc: Minchan Kim Cc: zhangyi (F) Cc: Miao Xie Cc: Mike Rapoport Cc: Shaohua Li Cc: Andrea Arcangeli Cc: Mel Gorman Cc: Kirill A. Shutemov Cc: David Rientjes Cc: Anshuman Khandual Cc: Rik van Riel Cc: Carsten Otte Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/madvise.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 375cf32087e4..751e97aa2210 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -276,15 +276,14 @@ static long madvise_willneed(struct vm_area_struct *vma, { struct file *file = vma->vm_file; + *prev = vma; #ifdef CONFIG_SWAP if (!file) { - *prev = vma; force_swapin_readahead(vma, start, end); return 0; } if (shmem_mapping(file->f_mapping)) { - *prev = vma; force_shm_swapin_readahead(vma, start, end, file->f_mapping); return 0; @@ -299,7 +298,6 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } - *prev = vma; start = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; if (end > vma->vm_end) end = vma->vm_end; From 13167cf417ddf105d2822eebe775547ed35d1a08 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Wed, 29 Nov 2017 16:11:12 -0800 Subject: [PATCH 0015/1013] mm: migrate: fix an incorrect call of prep_transhuge_page() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 40a899ed16486455f964e46d1af31fd4fded21c1 upstream. In https://lkml.org/lkml/2017/11/20/411, Andrea reported that during memory hotplug/hot remove prep_transhuge_page() is called incorrectly on non-THP pages for migration, when THP is on but THP migration is not enabled. This leads to a bad state of target pages for migration. By inspecting the code, if called on a non-THP, prep_transhuge_page() will 1) change the value of the mapping of (page + 2), since it is used for THP deferred list; 2) change the lru value of (page + 1), since it is used for THP's dtor. Both can lead to data corruption of these two pages. Andrea said: "Pragmatically and from the point of view of the memory_hotplug subsys, the effect is a kernel crash when pages are being migrated during a memory hot remove offline and migration target pages are found in a bad state" This patch fixes it by only calling prep_transhuge_page() when we are certain that the target page is THP. Link: http://lkml.kernel.org/r/20171121021855.50525-1-zi.yan@sent.com Fixes: 8135d8926c08 ("mm: memory_hotplug: memory hotremove supports thp migration") Signed-off-by: Zi Yan Reported-by: Andrea Reale Cc: Naoya Horiguchi Cc: Michal Hocko Cc: "Jérôme Glisse" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/migrate.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 895ec0c4942e..a2246cf670ba 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -54,7 +54,7 @@ static inline struct page *new_page_nodemask(struct page *page, new_page = __alloc_pages_nodemask(gfp_mask, order, preferred_nid, nodemask); - if (new_page && PageTransHuge(page)) + if (new_page && PageTransHuge(new_page)) prep_transhuge_page(new_page); return new_page; From d983b6251c5535b5c5eb02ce48bf8eeb7417ef65 Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Wed, 29 Nov 2017 16:11:15 -0800 Subject: [PATCH 0016/1013] mm, memcg: fix mem_cgroup_swapout() for THPs commit d08afa149acfd00871484ada6dabc3880524cd1c upstream. Commit d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP") changed mem_cgroup_swapout() to support transparent huge page (THP). However the patch missed one location which should be changed for correctly handling THPs. The resulting bug will cause the memory cgroups whose THPs were swapped out to become zombies on deletion. Link: http://lkml.kernel.org/r/20171128161941.20931-1-shakeelb@google.com Fixes: d6810d730022 ("memcg, THP, swap: make mem_cgroup_swapout() support THP") Signed-off-by: Shakeel Butt Acked-by: Johannes Weiner Acked-by: Michal Hocko Cc: Huang Ying Cc: Vladimir Davydov Cc: Greg Thelen Signed-off-by: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/memcontrol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 661f046ad318..53f7c919b916 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6044,7 +6044,7 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) memcg_check_events(memcg, page); if (!mem_cgroup_is_root(memcg)) - css_put(&memcg->css); + css_put_many(&memcg->css, nr_entries); } /** From fa2a9894733bdeaa93c750735b7fdc581ef10024 Mon Sep 17 00:00:00 2001 From: OGAWA Hirofumi Date: Wed, 29 Nov 2017 16:11:19 -0800 Subject: [PATCH 0017/1013] fs/fat/inode.c: fix sb_rdonly() change commit b6e8e12c0aeb5fbf1bf46c84d58cc93aedede385 upstream. Commit bc98a42c1f7d ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)") converted fat_remount():new_rdonly from a bool to an int. However fat_remount() depends upon the compiler's conversion of a non-zero integer into boolean `true'. Fix it by switching `new_rdonly' back into a bool. Link: http://lkml.kernel.org/r/87mv3d5x51.fsf@mail.parknet.co.jp Fixes: bc98a42c1f7d0f8 ("VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)") Signed-off-by: OGAWA Hirofumi Cc: Joe Perches Cc: David Howells Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/fat/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/fat/inode.c b/fs/fat/inode.c index 30c52394a7ad..c7a4dee206b9 100644 --- a/fs/fat/inode.c +++ b/fs/fat/inode.c @@ -779,7 +779,7 @@ static void __exit fat_destroy_inodecache(void) static int fat_remount(struct super_block *sb, int *flags, char *data) { - int new_rdonly; + bool new_rdonly; struct msdos_sb_info *sbi = MSDOS_SB(sb); *flags |= MS_NODIRATIME | (sbi->options.isvfat ? 0 : MS_NOATIME); From 793a07bef9eb557ebad28cdf9ca30ba41d082c9f Mon Sep 17 00:00:00 2001 From: Ian Kent Date: Wed, 29 Nov 2017 16:11:23 -0800 Subject: [PATCH 0018/1013] autofs: revert "autofs: take more care to not update last_used on path walk" commit 43694d4bf843ddd34519e8e9de983deefeada699 upstream. While commit 092a53452bb7 ("autofs: take more care to not update last_used on path walk") helped (partially) resolve a problem where automounts were not expiring due to aggressive accesses from user space it has a side effect for very large environments. This change helps with the expire problem by making the expire more aggressive but, for very large environments, that means more mount requests from clients. When there are a lot of clients that can mean fairly significant server load increases. It turns out I put the last_used in this position to solve this very problem and failed to update my own thinking of the autofs expire policy. So the patch being reverted introduces a regression which should be fixed. Link: http://lkml.kernel.org/r/151174729420.6162.1832622523537052460.stgit@pluto.themaw.net Fixes: 092a53452b ("autofs: take more care to not update last_used on path walk") Signed-off-by: Ian Kent Reviewed-by: NeilBrown Cc: Al Viro Cc: Colin Walters Cc: David Howells Cc: Ondrej Holy Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/autofs4/root.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c index d79ced925861..82e8f6edfb48 100644 --- a/fs/autofs4/root.c +++ b/fs/autofs4/root.c @@ -281,8 +281,8 @@ static int autofs4_mount_wait(const struct path *path, bool rcu_walk) pr_debug("waiting for mount name=%pd\n", path->dentry); status = autofs4_wait(sbi, path, NFY_MOUNT); pr_debug("mount wait done status=%d\n", status); - ino->last_used = jiffies; } + ino->last_used = jiffies; return status; } @@ -321,21 +321,16 @@ static struct dentry *autofs4_mountpoint_changed(struct path *path) */ if (autofs_type_indirect(sbi->type) && d_unhashed(dentry)) { struct dentry *parent = dentry->d_parent; + struct autofs_info *ino; struct dentry *new; new = d_lookup(parent, &dentry->d_name); if (!new) return NULL; - if (new == dentry) - dput(new); - else { - struct autofs_info *ino; - - ino = autofs4_dentry_ino(new); - ino->last_used = jiffies; - dput(path->dentry); - path->dentry = new; - } + ino = autofs4_dentry_ino(new); + ino->last_used = jiffies; + dput(path->dentry); + path->dentry = new; } return path->dentry; } From f354f4ffe6a371a4ae27a682cea56a7e4e1d74d9 Mon Sep 17 00:00:00 2001 From: Ian Kent Date: Wed, 29 Nov 2017 16:11:26 -0800 Subject: [PATCH 0019/1013] autofs: revert "autofs: fix AT_NO_AUTOMOUNT not being honored" commit 5d38f049cee1e1c4a7ac55aa79d37d01ddcc3860 upstream. Commit 42f461482178 ("autofs: fix AT_NO_AUTOMOUNT not being honored") allowed the fstatat(2) system call to properly honor the AT_NO_AUTOMOUNT flag but introduced a semantic change. In order to honor AT_NO_AUTOMOUNT a semantic change was made to the negative dentry case for stat family system calls in follow_automount(). This changed the unconditional triggering of an automount in this case to no longer be done and an error returned instead. This has caused more problems than I expected so reverting the change is needed. In a discussion with Neil Brown it was concluded that the automount(8) daemon can implement this change without kernel modifications. So that will be done instead and the autofs module documentation updated with a description of the problem and what needs to be done by module users for this specific case. Link: http://lkml.kernel.org/r/151174730120.6162.3848002191530283984.stgit@pluto.themaw.net Fixes: 42f4614821 ("autofs: fix AT_NO_AUTOMOUNT not being honored") Signed-off-by: Ian Kent Cc: Neil Brown Cc: Al Viro Cc: David Howells Cc: Colin Walters Cc: Ondrej Holy Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/namei.c | 15 +++------------ include/linux/fs.h | 3 ++- 2 files changed, 5 insertions(+), 13 deletions(-) diff --git a/fs/namei.c b/fs/namei.c index ed8b9488a890..62a0db6e6725 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1129,18 +1129,9 @@ static int follow_automount(struct path *path, struct nameidata *nd, * of the daemon to instantiate them before they can be used. */ if (!(nd->flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY | - LOOKUP_OPEN | LOOKUP_CREATE | - LOOKUP_AUTOMOUNT))) { - /* Positive dentry that isn't meant to trigger an - * automount, EISDIR will allow it to be used, - * otherwise there's no mount here "now" so return - * ENOENT. - */ - if (path->dentry->d_inode) - return -EISDIR; - else - return -ENOENT; - } + LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) && + path->dentry->d_inode) + return -EISDIR; if (path->dentry->d_sb->s_user_ns != &init_user_ns) return -EACCES; diff --git a/include/linux/fs.h b/include/linux/fs.h index 9c6c51b7a644..440281f8564d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3069,7 +3069,8 @@ static inline int vfs_lstat(const char __user *name, struct kstat *stat) static inline int vfs_fstatat(int dfd, const char __user *filename, struct kstat *stat, int flags) { - return vfs_statx(dfd, filename, flags, stat, STATX_BASIC_STATS); + return vfs_statx(dfd, filename, flags | AT_NO_AUTOMOUNT, + stat, STATX_BASIC_STATS); } static inline int vfs_fstat(int fd, struct kstat *stat) { From be75ad849b9f3b0512e976a81e31a0beea1419ab Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Wed, 29 Nov 2017 16:11:30 -0800 Subject: [PATCH 0020/1013] mm/hugetlb: fix NULL-pointer dereference on 5-level paging machine commit f4f0a3d85b50a65a348e2b8635041d6b30f01deb upstream. I made a mistake during converting hugetlb code to 5-level paging: in huge_pte_alloc() we have to use p4d_alloc(), not p4d_offset(). Otherwise it leads to crash -- NULL-pointer dereference in pud_alloc() if p4d table is not yet allocated. It only can happen in 5-level paging mode. In 4-level paging mode p4d_offset() always returns pgd, so we are fine. Link: http://lkml.kernel.org/r/20171122121921.64822-1-kirill.shutemov@linux.intel.com Fixes: c2febafc6773 ("mm: convert generic code to 5-level paging") Signed-off-by: Kirill A. Shutemov Acked-by: Vlastimil Babka Acked-by: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/hugetlb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a2233d722ff9..c539941671b4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4625,7 +4625,9 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, pte_t *pte = NULL; pgd = pgd_offset(mm, addr); - p4d = p4d_offset(pgd, addr); + p4d = p4d_alloc(mm, pgd, addr); + if (!p4d) + return NULL; pud = pud_alloc(mm, p4d, addr); if (pud) { if (sz == PUD_SIZE) { From 28580e75734b8bfc10d0c23ba026737c71c551c8 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Fri, 17 Nov 2017 14:50:46 -0500 Subject: [PATCH 0021/1013] btrfs: clear space cache inode generation always commit 8e138e0d92c6c9d3d481674fb14e3439b495be37 upstream. We discovered a box that had double allocations, and suspected the space cache may be to blame. While auditing the write out path I noticed that if we've already setup the space cache we will just carry on. This means that any error we hit after cache_save_setup before we go to actually write the cache out we won't reset the inode generation, so whatever was already written will be considered correct, except it'll be stale. Fix this by _always_ resetting the generation on the block group inode, this way we only ever have valid or invalid cache. With this patch I was no longer able to reproduce cache corruption with dm-log-writes and my bpf error injection tool. Signed-off-by: Josef Bacik Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/extent-tree.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 08698105fa4a..e4774c02d922 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -3526,13 +3526,6 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group, goto again; } - /* We've already setup this transaction, go ahead and exit */ - if (block_group->cache_generation == trans->transid && - i_size_read(inode)) { - dcs = BTRFS_DC_SETUP; - goto out_put; - } - /* * We want to set the generation to 0, that way if anything goes wrong * from here on out we know not to trust this cache when we load up next @@ -3556,6 +3549,13 @@ static int cache_save_setup(struct btrfs_block_group_cache *block_group, } WARN_ON(ret); + /* We've already setup this transaction, go ahead and exit */ + if (block_group->cache_generation == trans->transid && + i_size_read(inode)) { + dcs = BTRFS_DC_SETUP; + goto out_put; + } + if (i_size_read(inode) > 0) { ret = btrfs_check_trunc_cache_free_space(fs_info, &fs_info->global_block_rsv); From db77ab54a5e2f740d1ae39326f90c06d5c5a26d4 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 3 Nov 2017 08:00:10 -0400 Subject: [PATCH 0022/1013] nfsd: Fix stateid races between OPEN and CLOSE commit 15ca08d3299682dc49bad73251677b2c5017ef08 upstream. Open file stateids can linger on the nfs4_file list of stateids even after they have been closed. In order to avoid reusing such a stateid, and confusing the client, we need to recheck the nfs4_stid's type after taking the mutex. Otherwise, we risk reusing an old stateid that was already closed, which will confuse clients that expect new stateids to conform to RFC7530 Sections 9.1.4.2 and 16.2.5 or RFC5661 Sections 8.2.2 and 18.2.4. Signed-off-by: Trond Myklebust Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/nfs4state.c | 67 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 59 insertions(+), 8 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index d386d569edbc..e5e66e505052 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -3512,7 +3512,9 @@ nfsd4_find_existing_open(struct nfs4_file *fp, struct nfsd4_open *open) /* ignore lock owners */ if (local->st_stateowner->so_is_open_owner == 0) continue; - if (local->st_stateowner == &oo->oo_owner) { + if (local->st_stateowner != &oo->oo_owner) + continue; + if (local->st_stid.sc_type == NFS4_OPEN_STID) { ret = local; atomic_inc(&ret->st_stid.sc_count); break; @@ -3521,6 +3523,52 @@ nfsd4_find_existing_open(struct nfs4_file *fp, struct nfsd4_open *open) return ret; } +static __be32 +nfsd4_verify_open_stid(struct nfs4_stid *s) +{ + __be32 ret = nfs_ok; + + switch (s->sc_type) { + default: + break; + case NFS4_CLOSED_STID: + case NFS4_CLOSED_DELEG_STID: + ret = nfserr_bad_stateid; + break; + case NFS4_REVOKED_DELEG_STID: + ret = nfserr_deleg_revoked; + } + return ret; +} + +/* Lock the stateid st_mutex, and deal with races with CLOSE */ +static __be32 +nfsd4_lock_ol_stateid(struct nfs4_ol_stateid *stp) +{ + __be32 ret; + + mutex_lock(&stp->st_mutex); + ret = nfsd4_verify_open_stid(&stp->st_stid); + if (ret != nfs_ok) + mutex_unlock(&stp->st_mutex); + return ret; +} + +static struct nfs4_ol_stateid * +nfsd4_find_and_lock_existing_open(struct nfs4_file *fp, struct nfsd4_open *open) +{ + struct nfs4_ol_stateid *stp; + for (;;) { + spin_lock(&fp->fi_lock); + stp = nfsd4_find_existing_open(fp, open); + spin_unlock(&fp->fi_lock); + if (!stp || nfsd4_lock_ol_stateid(stp) == nfs_ok) + break; + nfs4_put_stid(&stp->st_stid); + } + return stp; +} + static struct nfs4_openowner * alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open, struct nfsd4_compound_state *cstate) @@ -3565,6 +3613,7 @@ init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) mutex_init(&stp->st_mutex); mutex_lock(&stp->st_mutex); +retry: spin_lock(&oo->oo_owner.so_client->cl_lock); spin_lock(&fp->fi_lock); @@ -3589,7 +3638,11 @@ init_open_stateid(struct nfs4_file *fp, struct nfsd4_open *open) spin_unlock(&fp->fi_lock); spin_unlock(&oo->oo_owner.so_client->cl_lock); if (retstp) { - mutex_lock(&retstp->st_mutex); + /* Handle races with CLOSE */ + if (nfsd4_lock_ol_stateid(retstp) != nfs_ok) { + nfs4_put_stid(&retstp->st_stid); + goto retry; + } /* To keep mutex tracking happy */ mutex_unlock(&stp->st_mutex); stp = retstp; @@ -4410,9 +4463,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf status = nfs4_check_deleg(cl, open, &dp); if (status) goto out; - spin_lock(&fp->fi_lock); - stp = nfsd4_find_existing_open(fp, open); - spin_unlock(&fp->fi_lock); + stp = nfsd4_find_and_lock_existing_open(fp, open); } else { open->op_file = NULL; status = nfserr_bad_stateid; @@ -4426,7 +4477,6 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf */ if (stp) { /* Stateid was found, this is an OPEN upgrade */ - mutex_lock(&stp->st_mutex); status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { mutex_unlock(&stp->st_mutex); @@ -5317,7 +5367,6 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s) bool unhashed; LIST_HEAD(reaplist); - s->st_stid.sc_type = NFS4_CLOSED_STID; spin_lock(&clp->cl_lock); unhashed = unhash_open_stateid(s, &reaplist); @@ -5357,10 +5406,12 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate, nfsd4_bump_seqid(cstate, status); if (status) goto out; + + stp->st_stid.sc_type = NFS4_CLOSED_STID; nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid); - mutex_unlock(&stp->st_mutex); nfsd4_close_open_stateid(stp); + mutex_unlock(&stp->st_mutex); /* put reference from nfs4_preprocess_seqid_op */ nfs4_put_stid(&stp->st_stid); From 73cfeab6755ccd442b689ea15769b7e9342c471e Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 3 Nov 2017 08:00:11 -0400 Subject: [PATCH 0023/1013] nfsd: Fix another OPEN stateid race commit d8a1a000555ecd1b824ac1ed6df8fe364dfbbbb0 upstream. If nfsd4_process_open2() is initialising a new stateid, and yet the call to nfs4_get_vfs_file() fails for some reason, then we must declare the stateid closed, and unhash it before dropping the mutex. Right now, we unhash the stateid after dropping the mutex, and without changing the stateid type, meaning that another OPEN could theoretically look it up and attempt to use it. Reported-by: Andrew W Elble Signed-off-by: Trond Myklebust Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/nfs4state.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index e5e66e505052..30f35bc4dcf6 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4452,6 +4452,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf struct nfs4_ol_stateid *stp = NULL; struct nfs4_delegation *dp = NULL; __be32 status; + bool new_stp = false; /* * Lookup file; if found, lookup stateid and check open request, @@ -4471,11 +4472,19 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf goto out; } + if (!stp) { + stp = init_open_stateid(fp, open); + if (!open->op_stp) + new_stp = true; + } + /* * OPEN the file, or upgrade an existing OPEN. * If truncate fails, the OPEN fails. + * + * stp is already locked. */ - if (stp) { + if (!new_stp) { /* Stateid was found, this is an OPEN upgrade */ status = nfs4_upgrade_open(rqstp, fp, current_fh, stp, open); if (status) { @@ -4483,22 +4492,11 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf goto out; } } else { - /* stp is returned locked. */ - stp = init_open_stateid(fp, open); - /* See if we lost the race to some other thread */ - if (stp->st_access_bmap != 0) { - status = nfs4_upgrade_open(rqstp, fp, current_fh, - stp, open); - if (status) { - mutex_unlock(&stp->st_mutex); - goto out; - } - goto upgrade_out; - } status = nfs4_get_vfs_file(rqstp, fp, current_fh, stp, open); if (status) { - mutex_unlock(&stp->st_mutex); + stp->st_stid.sc_type = NFS4_CLOSED_STID; release_open_stateid(stp); + mutex_unlock(&stp->st_mutex); goto out; } @@ -4507,7 +4505,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf if (stp->st_clnt_odstate == open->op_odstate) open->op_odstate = NULL; } -upgrade_out: + nfs4_inc_and_copy_stateid(&open->op_stateid, &stp->st_stid); mutex_unlock(&stp->st_mutex); From b78da96d2882bea2ec7184dc1eba3e07e829364d Mon Sep 17 00:00:00 2001 From: Naofumi Honda Date: Thu, 9 Nov 2017 10:57:16 -0500 Subject: [PATCH 0024/1013] nfsd: fix panic in posix_unblock_lock called from nfs4_laundromat commit 64ebe12494fd5d193f014ce38e1fd83cc57883c8 upstream. From kernel 4.9, my two nfsv4 servers sometimes suffer from "panic: unable to handle kernel page request" in posix_unblock_lock() called from nfs4_laundromat(). These panics diseappear if we revert the commit "nfsd: add a LRU list for blocked locks". The cause appears to be a typo in nfs4_laundromat(), which is also present in nfs4_state_shutdown_net(). Fixes: 7919d0a27f1e "nfsd: add a LRU list for blocked locks" Cc: jlayton@redhat.com Reveiwed-by: Jeff Layton Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/nfsd/nfs4state.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 30f35bc4dcf6..a439a70177a4 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -4732,7 +4732,7 @@ nfs4_laundromat(struct nfsd_net *nn) spin_unlock(&nn->blocked_locks_lock); while (!list_empty(&reaplist)) { - nbl = list_first_entry(&nn->blocked_locks_lru, + nbl = list_first_entry(&reaplist, struct nfsd4_blocked_lock, nbl_lru); list_del_init(&nbl->nbl_lru); posix_unblock_lock(&nbl->nbl_lock); @@ -7152,7 +7152,7 @@ nfs4_state_shutdown_net(struct net *net) spin_unlock(&nn->blocked_locks_lock); while (!list_empty(&reaplist)) { - nbl = list_first_entry(&nn->blocked_locks_lru, + nbl = list_first_entry(&reaplist, struct nfsd4_blocked_lock, nbl_lru); list_del_init(&nbl->nbl_lru); posix_unblock_lock(&nbl->nbl_lock); From 721872a1997c4b8aae80ed2e6ef9dc05c53383f6 Mon Sep 17 00:00:00 2001 From: Stephan Mueller Date: Fri, 10 Nov 2017 11:04:52 +0100 Subject: [PATCH 0025/1013] crypto: algif_aead - skip SGL entries with NULL page commit 8e1fa89aa8bc2870009b4486644e4a58f2e2a4f5 upstream. The TX SGL may contain SGL entries that are assigned a NULL page. This may happen if a multi-stage AIO operation is performed where the data for each stage is pointed to by one SGL entry. Upon completion of that stage, af_alg_pull_tsgl will assign NULL to the SGL entry. The NULL cipher used to copy the AAD from TX SGL to the destination buffer, however, cannot handle the case where the SGL starts with an SGL entry having a NULL page. Thus, the code needs to advance the start pointer into the SGL to the first non-NULL entry. This fixes a crash visible on Intel x86 32 bit using the libkcapi test suite. Fixes: 72548b093ee38 ("crypto: algif_aead - copy AAD from src to dst") Signed-off-by: Stephan Mueller Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/algif_aead.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-) diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index 516b38c3a169..ac8f101ab29c 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -101,10 +101,10 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, struct aead_tfm *aeadc = pask->private; struct crypto_aead *tfm = aeadc->aead; struct crypto_skcipher *null_tfm = aeadc->null_tfm; - unsigned int as = crypto_aead_authsize(tfm); + unsigned int i, as = crypto_aead_authsize(tfm); struct af_alg_async_req *areq; - struct af_alg_tsgl *tsgl; - struct scatterlist *src; + struct af_alg_tsgl *tsgl, *tmp; + struct scatterlist *rsgl_src, *tsgl_src = NULL; int err = 0; size_t used = 0; /* [in] TX bufs to be en/decrypted */ size_t outlen = 0; /* [out] RX bufs produced by kernel */ @@ -178,7 +178,22 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, } processed = used + ctx->aead_assoclen; - tsgl = list_first_entry(&ctx->tsgl_list, struct af_alg_tsgl, list); + list_for_each_entry_safe(tsgl, tmp, &ctx->tsgl_list, list) { + for (i = 0; i < tsgl->cur; i++) { + struct scatterlist *process_sg = tsgl->sg + i; + + if (!(process_sg->length) || !sg_page(process_sg)) + continue; + tsgl_src = process_sg; + break; + } + if (tsgl_src) + break; + } + if (processed && !tsgl_src) { + err = -EFAULT; + goto free; + } /* * Copy of AAD from source to destination @@ -194,7 +209,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, */ /* Use the RX SGL as source (and destination) for crypto op. */ - src = areq->first_rsgl.sgl.sg; + rsgl_src = areq->first_rsgl.sgl.sg; if (ctx->enc) { /* @@ -207,7 +222,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, * v v * RX SGL: AAD || PT || Tag */ - err = crypto_aead_copy_sgl(null_tfm, tsgl->sg, + err = crypto_aead_copy_sgl(null_tfm, tsgl_src, areq->first_rsgl.sgl.sg, processed); if (err) goto free; @@ -225,7 +240,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, */ /* Copy AAD || CT to RX SGL buffer for in-place operation. */ - err = crypto_aead_copy_sgl(null_tfm, tsgl->sg, + err = crypto_aead_copy_sgl(null_tfm, tsgl_src, areq->first_rsgl.sgl.sg, outlen); if (err) goto free; @@ -257,11 +272,11 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, areq->tsgl); } else /* no RX SGL present (e.g. authentication only) */ - src = areq->tsgl; + rsgl_src = areq->tsgl; } /* Initialize the crypto operation */ - aead_request_set_crypt(&areq->cra_u.aead_req, src, + aead_request_set_crypt(&areq->cra_u.aead_req, rsgl_src, areq->first_rsgl.sgl.sg, used, ctx->iv); aead_request_set_ad(&areq->cra_u.aead_req, ctx->aead_assoclen); aead_request_set_tfm(&areq->cra_u.aead_req, tfm); From 7f21961a2d7bfe45f734eab5b0ce6a40ee128b17 Mon Sep 17 00:00:00 2001 From: Stephan Mueller Date: Fri, 10 Nov 2017 13:20:55 +0100 Subject: [PATCH 0026/1013] crypto: af_alg - remove locking in async callback commit 7d2c3f54e6f646887d019faa45f35d6fe9fe82ce upstream. The code paths protected by the socket-lock do not use or modify the socket in a non-atomic fashion. The actions pertaining the socket do not even need to be handled as an atomic operation. Thus, the socket-lock can be safely ignored. This fixes a bug regarding scheduling in atomic as the callback function may be invoked in interrupt context. In addition, the sock_hold is moved before the AIO encrypt/decrypt operation to ensure that the socket is always present. This avoids a tiny race window where the socket is unprotected and yet used by the AIO operation. Finally, the release of resources for a crypto operation is moved into a common function of af_alg_free_resources. Fixes: e870456d8e7c8 ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae43 ("crypto: algif_aead - overhaul memory management") Reported-by: Romain Izard Signed-off-by: Stephan Mueller Tested-by: Romain Izard Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/af_alg.c | 21 ++++++++++++++------- crypto/algif_aead.c | 23 ++++++++++++----------- crypto/algif_skcipher.c | 23 ++++++++++++----------- include/crypto/if_alg.h | 1 + 4 files changed, 39 insertions(+), 29 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index 337cf382718e..a72659f452a5 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -1047,6 +1047,18 @@ ssize_t af_alg_sendpage(struct socket *sock, struct page *page, } EXPORT_SYMBOL_GPL(af_alg_sendpage); +/** + * af_alg_free_resources - release resources required for crypto request + */ +void af_alg_free_resources(struct af_alg_async_req *areq) +{ + struct sock *sk = areq->sk; + + af_alg_free_areq_sgls(areq); + sock_kfree_s(sk, areq, areq->areqlen); +} +EXPORT_SYMBOL_GPL(af_alg_free_resources); + /** * af_alg_async_cb - AIO callback handler * @@ -1063,18 +1075,13 @@ void af_alg_async_cb(struct crypto_async_request *_req, int err) struct kiocb *iocb = areq->iocb; unsigned int resultlen; - lock_sock(sk); - /* Buffer size written by crypto operation. */ resultlen = areq->outlen; - af_alg_free_areq_sgls(areq); - sock_kfree_s(sk, areq, areq->areqlen); - __sock_put(sk); + af_alg_free_resources(areq); + sock_put(sk); iocb->ki_complete(iocb, err ? err : resultlen, 0); - - release_sock(sk); } EXPORT_SYMBOL_GPL(af_alg_async_cb); diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index ac8f101ab29c..d0b45145cb30 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -283,12 +283,23 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) { /* AIO operation */ + sock_hold(sk); areq->iocb = msg->msg_iocb; aead_request_set_callback(&areq->cra_u.aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG, af_alg_async_cb, areq); err = ctx->enc ? crypto_aead_encrypt(&areq->cra_u.aead_req) : crypto_aead_decrypt(&areq->cra_u.aead_req); + + /* AIO operation in progress */ + if (err == -EINPROGRESS || err == -EBUSY) { + /* Remember output size that will be generated. */ + areq->outlen = outlen; + + return -EIOCBQUEUED; + } + + sock_put(sk); } else { /* Synchronous operation */ aead_request_set_callback(&areq->cra_u.aead_req, @@ -300,19 +311,9 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, &ctx->completion); } - /* AIO operation in progress */ - if (err == -EINPROGRESS) { - sock_hold(sk); - - /* Remember output size that will be generated. */ - areq->outlen = outlen; - - return -EIOCBQUEUED; - } free: - af_alg_free_areq_sgls(areq); - sock_kfree_s(sk, areq, areq->areqlen); + af_alg_free_resources(areq); return err ? err : outlen; } diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index 8ae4170aaeb4..30ee2a8e8f42 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -117,6 +117,7 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg, if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) { /* AIO operation */ + sock_hold(sk); areq->iocb = msg->msg_iocb; skcipher_request_set_callback(&areq->cra_u.skcipher_req, CRYPTO_TFM_REQ_MAY_SLEEP, @@ -124,6 +125,16 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg, err = ctx->enc ? crypto_skcipher_encrypt(&areq->cra_u.skcipher_req) : crypto_skcipher_decrypt(&areq->cra_u.skcipher_req); + + /* AIO operation in progress */ + if (err == -EINPROGRESS || err == -EBUSY) { + /* Remember output size that will be generated. */ + areq->outlen = len; + + return -EIOCBQUEUED; + } + + sock_put(sk); } else { /* Synchronous operation */ skcipher_request_set_callback(&areq->cra_u.skcipher_req, @@ -137,19 +148,9 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg, &ctx->completion); } - /* AIO operation in progress */ - if (err == -EINPROGRESS) { - sock_hold(sk); - - /* Remember output size that will be generated. */ - areq->outlen = len; - - return -EIOCBQUEUED; - } free: - af_alg_free_areq_sgls(areq); - sock_kfree_s(sk, areq, areq->areqlen); + af_alg_free_resources(areq); return err ? err : len; } diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h index 75ec9c662268..aeec003a566b 100644 --- a/include/crypto/if_alg.h +++ b/include/crypto/if_alg.h @@ -255,6 +255,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size, unsigned int ivsize); ssize_t af_alg_sendpage(struct socket *sock, struct page *page, int offset, size_t size, int flags); +void af_alg_free_resources(struct af_alg_async_req *areq); void af_alg_async_cb(struct crypto_async_request *_req, int err); unsigned int af_alg_poll(struct file *file, struct socket *sock, poll_table *wait); From 861eae231b0437427676a4ed323d31097002d5a2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ondrej=20Mosn=C3=A1=C4=8Dek?= Date: Thu, 23 Nov 2017 13:49:06 +0100 Subject: [PATCH 0027/1013] crypto: skcipher - Fix skcipher_walk_aead_common commit c14ca8386539a298c1c19b003fe55e37d0f0e89c upstream. The skcipher_walk_aead_common function calls scatterwalk_copychunks on the input and output walks to skip the associated data. If the AD end at an SG list entry boundary, then after these calls the walks will still be pointing to the end of the skipped region. These offsets are later checked for alignment in skcipher_walk_next, so the skcipher_walk may detect the alignment incorrectly. This patch fixes it by calling scatterwalk_done after the copychunks calls to ensure that the offsets refer to the right SG list entry. Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface") Signed-off-by: Ondrej Mosnacek Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/skcipher.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/crypto/skcipher.c b/crypto/skcipher.c index d5692e35fab1..778e0ff42bfa 100644 --- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -522,6 +522,9 @@ static int skcipher_walk_aead_common(struct skcipher_walk *walk, scatterwalk_copychunks(NULL, &walk->in, req->assoclen, 2); scatterwalk_copychunks(NULL, &walk->out, req->assoclen, 2); + scatterwalk_done(&walk->in, 0, walk->total); + scatterwalk_done(&walk->out, 0, walk->total); + walk->iv = req->iv; walk->oiv = req->iv; From 84779085fa10014b9e8208d7e71b54bced73075c Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Thu, 2 Nov 2017 13:03:42 +0300 Subject: [PATCH 0028/1013] lockd: lost rollback of set_grace_period() in lockd_down_net() commit 3a2b19d1ee5633f76ae8a88da7bc039a5d1732aa upstream. Commit efda760fe95ea ("lockd: fix lockd shutdown race") is incorrect, it removes lockd_manager and disarm grace_period_end for init_net only. If nfsd was started from another net namespace lockd_up_net() calls set_grace_period() that adds lockd_manager into per-netns list and queues grace_period_end delayed work. These action should be reverted in lockd_down_net(). Otherwise it can lead to double list_add on after restart nfsd in netns, and to use-after-free if non-disarmed delayed work will be executed after netns destroy. Fixes: efda760fe95e ("lockd: fix lockd shutdown race") Signed-off-by: Vasily Averin Signed-off-by: J. Bruce Fields Signed-off-by: Greg Kroah-Hartman --- fs/lockd/svc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index f04ecfc7ece0..45e96549ebd2 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -274,6 +274,8 @@ static void lockd_down_net(struct svc_serv *serv, struct net *net) if (ln->nlmsvc_users) { if (--ln->nlmsvc_users == 0) { nlm_shutdown_hosts_net(net); + cancel_delayed_work_sync(&ln->grace_period_end); + locks_end_grace(&ln->lockd_manager); svc_shutdown_net(serv, net); dprintk("lockd_down_net: per-net data destroyed; net=%p\n", net); } From b367ea94535dbdede5e00110339b3e4c9bdc617d Mon Sep 17 00:00:00 2001 From: Martin Schwidefsky Date: Fri, 24 Nov 2017 16:23:15 +0100 Subject: [PATCH 0029/1013] s390: revert ELF_ET_DYN_BASE base changes commit 345f8f34bb473241d62803951c18a844dd705f8d upstream. This reverts commit a73dc5370e153ac63718d850bddf0c9aa9d871e6. Reducing the base address for 31-bit PIE executables from (STACK_TOP/3)*2 to 4MB broke several compat programs which use -fpie to move the executable out of the lower 16MB. Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/elf.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h index 9a3cb3983c01..1a61b1b997f2 100644 --- a/arch/s390/include/asm/elf.h +++ b/arch/s390/include/asm/elf.h @@ -194,13 +194,14 @@ struct arch_elf_state { #define CORE_DUMP_USE_REGSET #define ELF_EXEC_PAGESIZE PAGE_SIZE -/* - * This is the base location for PIE (ET_DYN with INTERP) loads. On - * 64-bit, this is raised to 4GB to leave the entire 32-bit address - * space open for things that want to use the area for 32-bit pointers. - */ -#define ELF_ET_DYN_BASE (is_compat_task() ? 0x000400000UL : \ - 0x100000000UL) +/* This is the location that an ET_DYN program is loaded if exec'ed. Typical + use of this is to invoke "./ld.so someprog" to test out a new version of + the loader. We need to make sure that it is out of the way of the program + that it will "exec", and that there is sufficient room for the brk. 64-bit + tasks are aligned to 4GB. */ +#define ELF_ET_DYN_BASE (is_compat_task() ? \ + (STACK_TOP / 3 * 2) : \ + (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1)) /* This yields a mask that user programs can use to figure out what instruction set this CPU supports. */ From 4479ade38b14f53673b0843308fe24344b37cba5 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Thu, 16 Nov 2017 09:50:19 +0100 Subject: [PATCH 0030/1013] drm: omapdrm: Fix DPI on platforms using the DSI VDDS commit bf25dac38f71d392a31ec074f55cbc941f1eaf1d upstream. Commit d178e034d565 ("drm: omapdrm: Move FEAT_DPI_USES_VDDS_DSI feature to dpi code") replaced usage of platform data version with SoC matching to configure DPI VDDS. The SoC match entries were incorrect, they should have matched on the machine name instead of the SoC family. Fix it. The result was observed on OpenPandora with OMAP3530 where the panel only had the Blue channel and Red&Green were missing. It was not observed on GTA04 with DM3730. Fixes: d178e034d565 ("drm: omapdrm: Move FEAT_DPI_USES_VDDS_DSI feature to dpi code") Signed-off-by: Laurent Pinchart Reported-by: H. Nikolaus Schaller Tested-by: H. Nikolaus Schaller Signed-off-by: Tomi Valkeinen Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/omapdrm/dss/dpi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/omapdrm/dss/dpi.c b/drivers/gpu/drm/omapdrm/dss/dpi.c index daf286fc8a40..ca1e3b489540 100644 --- a/drivers/gpu/drm/omapdrm/dss/dpi.c +++ b/drivers/gpu/drm/omapdrm/dss/dpi.c @@ -566,8 +566,8 @@ static int dpi_verify_pll(struct dss_pll *pll) } static const struct soc_device_attribute dpi_soc_devices[] = { - { .family = "OMAP3[456]*" }, - { .family = "[AD]M37*" }, + { .machine = "OMAP3[456]*" }, + { .machine = "[AD]M37*" }, { /* sentinel */ } }; From a3ed24cd28f2db6f6f5b2009565a2463c0df4af0 Mon Sep 17 00:00:00 2001 From: Peter Ujfalusi Date: Mon, 20 Nov 2017 11:51:40 +0200 Subject: [PATCH 0031/1013] omapdrm: hdmi4: Correct the SoC revision matching commit 23970e150a0a49f9a966c46e5d22fed06226098f upstream. I believe the intention of the commit 2c9fc9bf45f8 ("drm: omapdrm: Move FEAT_HDMI_* features to hdmi4 driver") was to identify omap4430 ES1.x, omap4430 ES2.x and other OMAP4 revisions, like omap4460. By using family=OMAP4 in the match the code will treat omap4460 ES1.x in a same way as it would treat omap4430 ES1.x This breaks HDMI audio on OMAP4460 devices (PandaES for example). Correct the match rule so we are not going to get false positive match. Fixes: 2c9fc9bf45f8 ("drm: omapdrm: Move FEAT_HDMI_* features to hdmi4 driver") Signed-off-by: Peter Ujfalusi Reviewed-by: Laurent Pinchart Signed-off-by: Tomi Valkeinen Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/omapdrm/dss/hdmi4_core.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_core.c b/drivers/gpu/drm/omapdrm/dss/hdmi4_core.c index 365cf07daa01..c3453f3bd603 100644 --- a/drivers/gpu/drm/omapdrm/dss/hdmi4_core.c +++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_core.c @@ -889,25 +889,36 @@ struct hdmi4_features { bool audio_use_mclk; }; -static const struct hdmi4_features hdmi4_es1_features = { +static const struct hdmi4_features hdmi4430_es1_features = { .cts_swmode = false, .audio_use_mclk = false, }; -static const struct hdmi4_features hdmi4_es2_features = { +static const struct hdmi4_features hdmi4430_es2_features = { .cts_swmode = true, .audio_use_mclk = false, }; -static const struct hdmi4_features hdmi4_es3_features = { +static const struct hdmi4_features hdmi4_features = { .cts_swmode = true, .audio_use_mclk = true, }; static const struct soc_device_attribute hdmi4_soc_devices[] = { - { .family = "OMAP4", .revision = "ES1.?", .data = &hdmi4_es1_features }, - { .family = "OMAP4", .revision = "ES2.?", .data = &hdmi4_es2_features }, - { .family = "OMAP4", .data = &hdmi4_es3_features }, + { + .machine = "OMAP4430", + .revision = "ES1.?", + .data = &hdmi4430_es1_features, + }, + { + .machine = "OMAP4430", + .revision = "ES2.?", + .data = &hdmi4430_es2_features, + }, + { + .family = "OMAP4", + .data = &hdmi4_features, + }, { /* sentinel */ } }; From 69af22696bc737dfe3444d2ece5db000fbff5c35 Mon Sep 17 00:00:00 2001 From: John Johansen Date: Wed, 22 Nov 2017 07:33:38 -0800 Subject: [PATCH 0032/1013] apparmor: fix oops in audit_signal_cb hook commit b12cbb21586277f72533769832c24cc6c1d60ab3 upstream. The apparmor_audit_data struct ordering got messed up during a merge conflict, resulting in the signal integer and peer pointer being in a union instead of a struct. For most of the 4.13 and 4.14 life cycle, this was hidden by commit 651e28c5537a ("apparmor: add base infastructure for socket mediation") which fixed the apparmor_audit_data struct when its data was added. When that commit was reverted in -rc7 the signal audit bug was exposed, and unfortunately it never showed up in any of the testing until after 4.14 was released. Shaun Khan, Zephaniah E. Loss-Cutler-Hull filed nearly simultaneous bug reports (with different oopes, the smaller of which is included below). Full credit goes to Tetsuo Handa for jumping on this as well and noticing the audit data struct problem and reporting it. [ 76.178568] BUG: unable to handle kernel paging request at ffffffff0eee3bc0 [ 76.178579] IP: audit_signal_cb+0x6c/0xe0 [ 76.178581] PGD 1a640a067 P4D 1a640a067 PUD 0 [ 76.178586] Oops: 0000 [#1] PREEMPT SMP [ 76.178589] Modules linked in: fuse rfcomm bnep usblp uvcvideo btusb btrtl btbcm btintel bluetooth ecdh_generic ip6table_filter ip6_tables xt_tcpudp nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables x_tables intel_rapl joydev wmi_bmof serio_raw iwldvm iwlwifi shpchp kvm_intel kvm irqbypass autofs4 algif_skcipher nls_iso8859_1 nls_cp437 crc32_pclmul ghash_clmulni_intel [ 76.178620] CPU: 0 PID: 10675 Comm: pidgin Not tainted 4.14.0-f1-dirty #135 [ 76.178623] Hardware name: Hewlett-Packard HP EliteBook Folio 9470m/18DF, BIOS 68IBD Ver. F.62 10/22/2015 [ 76.178625] task: ffff9c7a94c31dc0 task.stack: ffffa09b02a4c000 [ 76.178628] RIP: 0010:audit_signal_cb+0x6c/0xe0 [ 76.178631] RSP: 0018:ffffa09b02a4fc08 EFLAGS: 00010292 [ 76.178634] RAX: ffffa09b02a4fd60 RBX: ffff9c7aee0741f8 RCX: 0000000000000000 [ 76.178636] RDX: ffffffffee012290 RSI: 0000000000000006 RDI: ffff9c7a9493d800 [ 76.178638] RBP: ffffa09b02a4fd40 R08: 000000000000004d R09: ffffa09b02a4fc46 [ 76.178641] R10: ffffa09b02a4fcb8 R11: ffff9c7ab44f5072 R12: ffffa09b02a4fd40 [ 76.178643] R13: ffffffff9e447be0 R14: ffff9c7a94c31dc0 R15: 0000000000000001 [ 76.178646] FS: 00007f8b11ba2a80(0000) GS:ffff9c7afea00000(0000) knlGS:0000000000000000 [ 76.178648] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 76.178650] CR2: ffffffff0eee3bc0 CR3: 00000003d5209002 CR4: 00000000001606f0 [ 76.178652] Call Trace: [ 76.178660] common_lsm_audit+0x1da/0x780 [ 76.178665] ? d_absolute_path+0x60/0x90 [ 76.178669] ? aa_check_perms+0xcd/0xe0 [ 76.178672] aa_check_perms+0xcd/0xe0 [ 76.178675] profile_signal_perm.part.0+0x90/0xa0 [ 76.178679] aa_may_signal+0x16e/0x1b0 [ 76.178686] apparmor_task_kill+0x51/0x120 [ 76.178690] security_task_kill+0x44/0x60 [ 76.178695] group_send_sig_info+0x25/0x60 [ 76.178699] kill_pid_info+0x36/0x60 [ 76.178703] SYSC_kill+0xdb/0x180 [ 76.178707] ? preempt_count_sub+0x92/0xd0 [ 76.178712] ? _raw_write_unlock_irq+0x13/0x30 [ 76.178716] ? task_work_run+0x6a/0x90 [ 76.178720] ? exit_to_usermode_loop+0x80/0xa0 [ 76.178723] entry_SYSCALL_64_fastpath+0x13/0x94 [ 76.178727] RIP: 0033:0x7f8b0e58b767 [ 76.178729] RSP: 002b:00007fff19efd4d8 EFLAGS: 00000206 ORIG_RAX: 000000000000003e [ 76.178732] RAX: ffffffffffffffda RBX: 0000557f3e3c2050 RCX: 00007f8b0e58b767 [ 76.178735] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000263b [ 76.178737] RBP: 0000000000000000 R08: 0000557f3e3c2270 R09: 0000000000000001 [ 76.178739] R10: 000000000000022d R11: 0000000000000206 R12: 0000000000000000 [ 76.178741] R13: 0000000000000001 R14: 0000557f3e3c13c0 R15: 0000000000000000 [ 76.178745] Code: 48 8b 55 18 48 89 df 41 b8 20 00 08 01 5b 5d 48 8b 42 10 48 8b 52 30 48 63 48 4c 48 8b 44 c8 48 31 c9 48 8b 70 38 e9 f4 fd 00 00 <48> 8b 14 d5 40 27 e5 9e 48 c7 c6 7d 07 19 9f 48 89 df e8 fd 35 [ 76.178794] RIP: audit_signal_cb+0x6c/0xe0 RSP: ffffa09b02a4fc08 [ 76.178796] CR2: ffffffff0eee3bc0 [ 76.178799] ---[ end trace 514af9529297f1a3 ]--- Fixes: cd1dbf76b23d ("apparmor: add the ability to mediate signals") Reported-by: Zephaniah E. Loss-Cutler-Hull Reported-by: Shuah Khan Suggested-by: Tetsuo Handa Tested-by: Ivan Kozik Tested-by: Zephaniah E. Loss-Cutler-Hull Tested-by: Christian Boltz Tested-by: Shuah Khan Signed-off-by: John Johansen Signed-off-by: Greg Kroah-Hartman --- security/apparmor/include/audit.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/security/apparmor/include/audit.h b/security/apparmor/include/audit.h index 620e81169659..4ac095118717 100644 --- a/security/apparmor/include/audit.h +++ b/security/apparmor/include/audit.h @@ -121,17 +121,19 @@ struct apparmor_audit_data { /* these entries require a custom callback fn */ struct { struct aa_label *peer; - struct { - const char *target; - kuid_t ouid; - } fs; + union { + struct { + const char *target; + kuid_t ouid; + } fs; + int signal; + }; }; struct { struct aa_profile *profile; const char *ns; long pos; } iface; - int signal; struct { int rlim; unsigned long max; From e1b4e001b22a12b67c890f4611bcf55285fd0fc5 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 20 Nov 2017 17:41:29 +0000 Subject: [PATCH 0033/1013] arm64: module-plts: factor out PLT generation code for ftrace commit 7e8b9c1d2e2f5f45db7d40b50d14f606097c25de upstream. To allow the ftrace trampoline code to reuse the PLT entry routines, factor it out and move it into asm/module.h. Signed-off-by: Ard Biesheuvel Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/module.h | 44 +++++++++++++++++++++++++++++++++ arch/arm64/kernel/module-plts.c | 38 ++-------------------------- 2 files changed, 46 insertions(+), 36 deletions(-) diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h index 19bd97671bb8..11d4aaee82e1 100644 --- a/arch/arm64/include/asm/module.h +++ b/arch/arm64/include/asm/module.h @@ -45,4 +45,48 @@ extern u64 module_alloc_base; #define module_alloc_base ((u64)_etext - MODULES_VSIZE) #endif +struct plt_entry { + /* + * A program that conforms to the AArch64 Procedure Call Standard + * (AAPCS64) must assume that a veneer that alters IP0 (x16) and/or + * IP1 (x17) may be inserted at any branch instruction that is + * exposed to a relocation that supports long branches. Since that + * is exactly what we are dealing with here, we are free to use x16 + * as a scratch register in the PLT veneers. + */ + __le32 mov0; /* movn x16, #0x.... */ + __le32 mov1; /* movk x16, #0x...., lsl #16 */ + __le32 mov2; /* movk x16, #0x...., lsl #32 */ + __le32 br; /* br x16 */ +}; + +static inline struct plt_entry get_plt_entry(u64 val) +{ + /* + * MOVK/MOVN/MOVZ opcode: + * +--------+------------+--------+-----------+-------------+---------+ + * | sf[31] | opc[30:29] | 100101 | hw[22:21] | imm16[20:5] | Rd[4:0] | + * +--------+------------+--------+-----------+-------------+---------+ + * + * Rd := 0x10 (x16) + * hw := 0b00 (no shift), 0b01 (lsl #16), 0b10 (lsl #32) + * opc := 0b11 (MOVK), 0b00 (MOVN), 0b10 (MOVZ) + * sf := 1 (64-bit variant) + */ + return (struct plt_entry){ + cpu_to_le32(0x92800010 | (((~val ) & 0xffff)) << 5), + cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5), + cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5), + cpu_to_le32(0xd61f0200) + }; +} + +static inline bool plt_entries_equal(const struct plt_entry *a, + const struct plt_entry *b) +{ + return a->mov0 == b->mov0 && + a->mov1 == b->mov1 && + a->mov2 == b->mov2; +} + #endif /* __ASM_MODULE_H */ diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c index d05dbe658409..ebff6c155cac 100644 --- a/arch/arm64/kernel/module-plts.c +++ b/arch/arm64/kernel/module-plts.c @@ -11,21 +11,6 @@ #include #include -struct plt_entry { - /* - * A program that conforms to the AArch64 Procedure Call Standard - * (AAPCS64) must assume that a veneer that alters IP0 (x16) and/or - * IP1 (x17) may be inserted at any branch instruction that is - * exposed to a relocation that supports long branches. Since that - * is exactly what we are dealing with here, we are free to use x16 - * as a scratch register in the PLT veneers. - */ - __le32 mov0; /* movn x16, #0x.... */ - __le32 mov1; /* movk x16, #0x...., lsl #16 */ - __le32 mov2; /* movk x16, #0x...., lsl #32 */ - __le32 br; /* br x16 */ -}; - static bool in_init(const struct module *mod, void *loc) { return (u64)loc - (u64)mod->init_layout.base < mod->init_layout.size; @@ -40,33 +25,14 @@ u64 module_emit_plt_entry(struct module *mod, void *loc, const Elf64_Rela *rela, int i = pltsec->plt_num_entries; u64 val = sym->st_value + rela->r_addend; - /* - * MOVK/MOVN/MOVZ opcode: - * +--------+------------+--------+-----------+-------------+---------+ - * | sf[31] | opc[30:29] | 100101 | hw[22:21] | imm16[20:5] | Rd[4:0] | - * +--------+------------+--------+-----------+-------------+---------+ - * - * Rd := 0x10 (x16) - * hw := 0b00 (no shift), 0b01 (lsl #16), 0b10 (lsl #32) - * opc := 0b11 (MOVK), 0b00 (MOVN), 0b10 (MOVZ) - * sf := 1 (64-bit variant) - */ - plt[i] = (struct plt_entry){ - cpu_to_le32(0x92800010 | (((~val ) & 0xffff)) << 5), - cpu_to_le32(0xf2a00010 | ((( val >> 16) & 0xffff)) << 5), - cpu_to_le32(0xf2c00010 | ((( val >> 32) & 0xffff)) << 5), - cpu_to_le32(0xd61f0200) - }; + plt[i] = get_plt_entry(val); /* * Check if the entry we just created is a duplicate. Given that the * relocations are sorted, this will be the last entry we allocated. * (if one exists). */ - if (i > 0 && - plt[i].mov0 == plt[i - 1].mov0 && - plt[i].mov1 == plt[i - 1].mov1 && - plt[i].mov2 == plt[i - 1].mov2) + if (i > 0 && plt_entries_equal(plt + i, plt + i - 1)) return (u64)&plt[i - 1]; pltsec->plt_num_entries++; From e9b2a6680fb6173819598cba411522d37cb4585a Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 20 Nov 2017 17:41:30 +0000 Subject: [PATCH 0034/1013] arm64: ftrace: emit ftrace-mod.o contents through code commit be0f272bfc83797f70d44faca86954df62e2bbc0 upstream. When building the arm64 kernel with both CONFIG_ARM64_MODULE_PLTS and CONFIG_DYNAMIC_FTRACE enabled, the ftrace-mod.o object file is built with the kernel and contains a trampoline that is linked into each module, so that modules can be loaded far away from the kernel and still reach the ftrace entry point in the core kernel with an ordinary relative branch, as is emitted by the compiler instrumentation code dynamic ftrace relies on. In order to be able to build out of tree modules, this object file needs to be included into the linux-headers or linux-devel packages, which is undesirable, as it makes arm64 a special case (although a precedent does exist for 32-bit PPC). Given that the trampoline essentially consists of a PLT entry, let's not bother with a source or object file for it, and simply patch it in whenever the trampoline is being populated, using the existing PLT support routines. Signed-off-by: Ard Biesheuvel Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/Makefile | 3 --- arch/arm64/include/asm/module.h | 2 +- arch/arm64/kernel/Makefile | 3 --- arch/arm64/kernel/ftrace-mod.S | 18 ------------------ arch/arm64/kernel/ftrace.c | 14 ++++++++------ arch/arm64/kernel/module-plts.c | 12 ++++++++++++ arch/arm64/kernel/module.lds | 1 + 7 files changed, 22 insertions(+), 31 deletions(-) delete mode 100644 arch/arm64/kernel/ftrace-mod.S diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 939b310913cf..3eb4397150df 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -77,9 +77,6 @@ endif ifeq ($(CONFIG_ARM64_MODULE_PLTS),y) KBUILD_LDFLAGS_MODULE += -T $(srctree)/arch/arm64/kernel/module.lds -ifeq ($(CONFIG_DYNAMIC_FTRACE),y) -KBUILD_LDFLAGS_MODULE += $(objtree)/arch/arm64/kernel/ftrace-mod.o -endif endif # Default value diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h index 11d4aaee82e1..4f766178fa6f 100644 --- a/arch/arm64/include/asm/module.h +++ b/arch/arm64/include/asm/module.h @@ -32,7 +32,7 @@ struct mod_arch_specific { struct mod_plt_sec init; /* for CONFIG_DYNAMIC_FTRACE */ - void *ftrace_trampoline; + struct plt_entry *ftrace_trampoline; }; #endif diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile index 0029e13adb59..2f5ff2a65db3 100644 --- a/arch/arm64/kernel/Makefile +++ b/arch/arm64/kernel/Makefile @@ -63,6 +63,3 @@ extra-y += $(head-y) vmlinux.lds ifeq ($(CONFIG_DEBUG_EFI),y) AFLAGS_head.o += -DVMLINUX_PATH="\"$(realpath $(objtree)/vmlinux)\"" endif - -# will be included by each individual module but not by the core kernel itself -extra-$(CONFIG_DYNAMIC_FTRACE) += ftrace-mod.o diff --git a/arch/arm64/kernel/ftrace-mod.S b/arch/arm64/kernel/ftrace-mod.S deleted file mode 100644 index 00c4025be4ff..000000000000 --- a/arch/arm64/kernel/ftrace-mod.S +++ /dev/null @@ -1,18 +0,0 @@ -/* - * Copyright (C) 2017 Linaro Ltd - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - */ - -#include -#include - - .section ".text.ftrace_trampoline", "ax" - .align 3 -0: .quad 0 -__ftrace_trampoline: - ldr x16, 0b - br x16 -ENDPROC(__ftrace_trampoline) diff --git a/arch/arm64/kernel/ftrace.c b/arch/arm64/kernel/ftrace.c index c13b1fca0e5b..50986e388d2b 100644 --- a/arch/arm64/kernel/ftrace.c +++ b/arch/arm64/kernel/ftrace.c @@ -76,7 +76,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) if (offset < -SZ_128M || offset >= SZ_128M) { #ifdef CONFIG_ARM64_MODULE_PLTS - unsigned long *trampoline; + struct plt_entry trampoline; struct module *mod; /* @@ -104,22 +104,24 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr) * is added in the future, but for now, the pr_err() below * deals with a theoretical issue only. */ - trampoline = (unsigned long *)mod->arch.ftrace_trampoline; - if (trampoline[0] != addr) { - if (trampoline[0] != 0) { + trampoline = get_plt_entry(addr); + if (!plt_entries_equal(mod->arch.ftrace_trampoline, + &trampoline)) { + if (!plt_entries_equal(mod->arch.ftrace_trampoline, + &(struct plt_entry){})) { pr_err("ftrace: far branches to multiple entry points unsupported inside a single module\n"); return -EINVAL; } /* point the trampoline to our ftrace entry point */ module_disable_ro(mod); - trampoline[0] = addr; + *mod->arch.ftrace_trampoline = trampoline; module_enable_ro(mod, true); /* update trampoline before patching in the branch */ smp_wmb(); } - addr = (unsigned long)&trampoline[1]; + addr = (unsigned long)(void *)mod->arch.ftrace_trampoline; #else /* CONFIG_ARM64_MODULE_PLTS */ return -EINVAL; #endif /* CONFIG_ARM64_MODULE_PLTS */ diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c index ebff6c155cac..ea640f92fe5a 100644 --- a/arch/arm64/kernel/module-plts.c +++ b/arch/arm64/kernel/module-plts.c @@ -120,6 +120,7 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, unsigned long core_plts = 0; unsigned long init_plts = 0; Elf64_Sym *syms = NULL; + Elf_Shdr *tramp = NULL; int i; /* @@ -131,6 +132,10 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, mod->arch.core.plt = sechdrs + i; else if (!strcmp(secstrings + sechdrs[i].sh_name, ".init.plt")) mod->arch.init.plt = sechdrs + i; + else if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE) && + !strcmp(secstrings + sechdrs[i].sh_name, + ".text.ftrace_trampoline")) + tramp = sechdrs + i; else if (sechdrs[i].sh_type == SHT_SYMTAB) syms = (Elf64_Sym *)sechdrs[i].sh_addr; } @@ -181,5 +186,12 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs, mod->arch.init.plt_num_entries = 0; mod->arch.init.plt_max_entries = init_plts; + if (tramp) { + tramp->sh_type = SHT_NOBITS; + tramp->sh_flags = SHF_EXECINSTR | SHF_ALLOC; + tramp->sh_addralign = __alignof__(struct plt_entry); + tramp->sh_size = sizeof(struct plt_entry); + } + return 0; } diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds index f7c9781a9d48..22e36a21c113 100644 --- a/arch/arm64/kernel/module.lds +++ b/arch/arm64/kernel/module.lds @@ -1,4 +1,5 @@ SECTIONS { .plt (NOLOAD) : { BYTE(0) } .init.plt (NOLOAD) : { BYTE(0) } + .text.ftrace_trampoline (NOLOAD) : { BYTE(0) } } From 3bfbf2b1bc18f76dc88064b4d6354b2e4424478c Mon Sep 17 00:00:00 2001 From: Mahesh Salgaonkar Date: Wed, 22 Nov 2017 23:02:07 +0530 Subject: [PATCH 0035/1013] powerpc/powernv: Fix kexec crashes caused by tlbie tracing commit a3961f824cdbe7eb431254dc7d8f6f6767f474aa upstream. Rebooting into a new kernel with kexec fails in trace_tlbie() which is called from native_hpte_clear(). This happens if the running kernel has CONFIG_LOCKDEP enabled. With lockdep enabled, the tracepoints always execute few RCU checks regardless of whether tracing is on or off. We are already in the last phase of kexec sequence in real mode with HILE_BE set. At this point the RCU check ends up in RCU_LOCKDEP_WARN and causes kexec to fail. Fix this by not calling trace_tlbie() from native_hpte_clear(). mpe: It's not safe to call trace points at this point in the kexec path, even if we could avoid the RCU checks/warnings. The only solution is to not call them. Fixes: 0428491cba92 ("powerpc/mm: Trace tlbie(l) instructions") Signed-off-by: Mahesh Salgaonkar Reported-by: Aneesh Kumar K.V Suggested-by: Michael Ellerman Acked-by: Naveen N. Rao Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/mm/hash_native_64.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c index 3848af167df9..640cf566e986 100644 --- a/arch/powerpc/mm/hash_native_64.c +++ b/arch/powerpc/mm/hash_native_64.c @@ -47,7 +47,8 @@ DEFINE_RAW_SPINLOCK(native_tlbie_lock); -static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize) +static inline unsigned long ___tlbie(unsigned long vpn, int psize, + int apsize, int ssize) { unsigned long va; unsigned int penc; @@ -100,7 +101,15 @@ static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize) : "memory"); break; } - trace_tlbie(0, 0, va, 0, 0, 0, 0); + return va; +} + +static inline void __tlbie(unsigned long vpn, int psize, int apsize, int ssize) +{ + unsigned long rb; + + rb = ___tlbie(vpn, psize, apsize, ssize); + trace_tlbie(0, 0, rb, 0, 0, 0, 0); } static inline void __tlbiel(unsigned long vpn, int psize, int apsize, int ssize) @@ -652,7 +661,7 @@ static void native_hpte_clear(void) if (hpte_v & HPTE_V_VALID) { hpte_decode(hptep, slot, &psize, &apsize, &ssize, &vpn); hptep->v = 0; - __tlbie(vpn, psize, apsize, ssize); + ___tlbie(vpn, psize, apsize, ssize); } } From c12e358867d2656b10ccccc36240d6ce93e0e497 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Fri, 24 Nov 2017 14:51:02 +1100 Subject: [PATCH 0036/1013] powerpc/kexec: Fix kexec/kdump in P9 guest kernels commit 2621e945fbf1d6df5f3f0ba7be5bae3d2cf9b6a5 upstream. The code that cleans up the IAMR/AMOR before kexec'ing failed to remember that when we're running as a guest AMOR is not writable, it's hypervisor privileged. They symptom is that the kexec stops before entering purgatory and nothing else is seen on the console. If you examine the state of the system all threads will be in the 0x700 program check handler. Fix it by making the write to AMOR dependent on HV mode. Fixes: 1e2a516e89fc ("powerpc/kexec: Fix radix to hash kexec due to IAMR/AMOR") Reported-by: Yilin Zhang Debugged-by: David Gibson Signed-off-by: Michael Ellerman Acked-by: Balbir Singh Reviewed-by: David Gibson Tested-by: David Gibson Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/misc_64.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 8ac0bd2bddb0..3280953a82cf 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -623,7 +623,9 @@ BEGIN_FTR_SECTION * NOTE, we rely on r0 being 0 from above. */ mtspr SPRN_IAMR,r0 +BEGIN_FTR_SECTION_NESTED(42) mtspr SPRN_AMOR,r0 +END_FTR_SECTION_NESTED_IFSET(CPU_FTR_HVMODE, 42) END_FTR_SECTION_IFSET(CPU_FTR_ARCH_300) /* save regs for local vars on new stack. From 6c4eaffbad3701f037f1d60ca892dcbe7641ff74 Mon Sep 17 00:00:00 2001 From: Liran Alon Date: Sun, 5 Nov 2017 16:11:30 +0200 Subject: [PATCH 0037/1013] KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 51c4b8bba674cfd2260d173602c4dac08e4c3a99 upstream. When guest passes KVM it's pvclock-page GPA via WRMSR to MSR_KVM_SYSTEM_TIME / MSR_KVM_SYSTEM_TIME_NEW, KVM don't initialize pvclock-page to some start-values. It just requests a clock-update which will happen before entering to guest. The clock-update logic will call kvm_setup_pvclock_page() to update the pvclock-page with info. However, kvm_setup_pvclock_page() *wrongly* assumes that the version-field is initialized to an even number. This is wrong because at first-time write, field could be any-value. Fix simply makes sure that if first-time version-field is odd, increment it once more to make it even and only then start standard logic. This follows same logic as done in other pvclock shared-pages (See kvm_write_wall_clock() and record_steal_time()). Signed-off-by: Liran Alon Reviewed-by: Nikita Leshenko Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 03869eb7fcd6..181106080e41 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1830,6 +1830,9 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v) */ BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); + if (guest_hv_clock.version & 1) + ++guest_hv_clock.version; /* first time write, random junk */ + vcpu->hv_clock.version = guest_hv_clock.version + 1; kvm_write_guest_cached(v->kvm, &vcpu->pv_time, &vcpu->hv_clock, From 15cc35bc76f95d22d1e3e42617f6103568f19931 Mon Sep 17 00:00:00 2001 From: Liran Alon Date: Sun, 5 Nov 2017 16:56:32 +0200 Subject: [PATCH 0038/1013] KVM: x86: Exit to user-mode on #UD intercept when emulator requires MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 61cb57c9ed631c95b54f8e9090c89d18b3695b3c upstream. Instruction emulation after trapping a #UD exception can result in an MMIO access, for example when emulating a MOVBE on a processor that doesn't support the instruction. In this case, the #UD vmexit handler must exit to user mode, but there wasn't any code to do so. Add it for both VMX and SVM. Signed-off-by: Liran Alon Reviewed-by: Nikita Leshenko Reviewed-by: Konrad Rzeszutek Wilk Signed-off-by: Konrad Rzeszutek Wilk Reviewed-by: Wanpeng Li Reviewed-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/svm.c | 2 ++ arch/x86/kvm/vmx.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ca209a4a7834..17fb6c6d939a 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2189,6 +2189,8 @@ static int ud_interception(struct vcpu_svm *svm) int er; er = emulate_instruction(&svm->vcpu, EMULTYPE_TRAP_UD); + if (er == EMULATE_USER_EXIT) + return 0; if (er != EMULATE_DONE) kvm_queue_exception(&svm->vcpu, UD_VECTOR); return 1; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 21cad7068cbf..b21113bcf227 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5914,6 +5914,8 @@ static int handle_exception(struct kvm_vcpu *vcpu) return 1; } er = emulate_instruction(vcpu, EMULTYPE_TRAP_UD); + if (er == EMULATE_USER_EXIT) + return 0; if (er != EMULATE_DONE) kvm_queue_exception(vcpu, UD_VECTOR); return 1; From bb98bd9733b5b54b0de67e171b0057dcc6fb98de Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Fri, 10 Nov 2017 10:49:38 +0100 Subject: [PATCH 0039/1013] KVM: x86: inject exceptions produced by x86_decode_insn MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 6ea6e84309ca7e0e850b3083e6b09344ee15c290 upstream. Sometimes, a processor might execute an instruction while another processor is updating the page tables for that instruction's code page, but before the TLB shootdown completes. The interesting case happens if the page is in the TLB. In general, the processor will succeed in executing the instruction and nothing bad happens. However, what if the instruction is an MMIO access? If *that* happens, KVM invokes the emulator, and the emulator gets the updated page tables. If the update side had marked the code page as non present, the page table walk then will fail and so will x86_decode_insn. Unfortunately, even though kvm_fetch_guest_virt is correctly returning X86EMUL_PROPAGATE_FAULT, x86_decode_insn's caller treats the failure as a fatal error if the instruction cannot simply be reexecuted (as is the case for MMIO). And this in fact happened sometimes when rebooting Windows 2012r2 guests. Just checking ctxt->have_exception and injecting the exception if true is enough to fix the case. Thanks to Eduardo Habkost for helping in the debugging of this issue. Reported-by: Yanan Fu Cc: Eduardo Habkost Signed-off-by: Paolo Bonzini Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 181106080e41..4195cbcdb310 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5708,6 +5708,8 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, if (reexecute_instruction(vcpu, cr2, write_fault_to_spt, emulation_type)) return EMULATE_DONE; + if (ctxt->have_exception && inject_emulated_exception(vcpu)) + return EMULATE_DONE; if (emulation_type & EMULTYPE_SKIP) return EMULATE_FAIL; return handle_emulation_failure(vcpu); From 47b35d190b4676953a5ce25572e031d842264194 Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Fri, 17 Nov 2017 11:52:49 +0000 Subject: [PATCH 0040/1013] KVM: lapic: Split out x2apic ldr calculation commit e872fa94662d0644057c7c80b3071bdb9249e5ab upstream. Split out the ldr calculation from kvm_apic_set_x2apic_id since we're about to reuse it in the following patch. Signed-off-by: Dr. David Alan Gilbert Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 36c90d631096..4991e9e51611 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -266,9 +266,14 @@ static inline void kvm_apic_set_ldr(struct kvm_lapic *apic, u32 id) recalculate_apic_map(apic->vcpu->kvm); } +static inline u32 kvm_apic_calc_x2apic_ldr(u32 id) +{ + return ((id >> 4) << 16) | (1 << (id & 0xf)); +} + static inline void kvm_apic_set_x2apic_id(struct kvm_lapic *apic, u32 id) { - u32 ldr = ((id >> 4) << 16) | (1 << (id & 0xf)); + u32 ldr = kvm_apic_calc_x2apic_ldr(id); WARN_ON_ONCE(id != apic->vcpu->vcpu_id); From 81b3019c35d650e7570ff016ba0732ba6d62c466 Mon Sep 17 00:00:00 2001 From: "Dr. David Alan Gilbert" Date: Fri, 17 Nov 2017 11:52:50 +0000 Subject: [PATCH 0041/1013] KVM: lapic: Fixup LDR on load in x2apic commit 12806ba937382fdfdbad62a399aa2dce65c10fcd upstream. In x2apic mode the LDR is fixed based on the ID rather than separately loadable like it was before x2. When kvm_apic_set_state is called, the base is set, and if it has the X2APIC_ENABLE flag set then the LDR is calculated; however that value gets overwritten by the memcpy a few lines below overwriting it with the value that came from userland. The symptom is a lack of EOI after loading the state (e.g. after a QEMU migration) and is due to the EOI bitmap being wrong due to the incorrect LDR. This was seen with a Win2016 guest under Qemu with irqchip=split whose USB mouse didn't work after a VM migration. This corresponds to RH bug: https://bugzilla.redhat.com/show_bug.cgi?id=1502591 Reported-by: Yiqian Wei Signed-off-by: Dr. David Alan Gilbert [Applied fixup from Liran Alon. - Paolo] Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/lapic.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 4991e9e51611..ef03efba1c23 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2201,6 +2201,7 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, { if (apic_x2apic_mode(vcpu->arch.apic)) { u32 *id = (u32 *)(s->regs + APIC_ID); + u32 *ldr = (u32 *)(s->regs + APIC_LDR); if (vcpu->kvm->arch.x2apic_format) { if (*id != vcpu->vcpu_id) @@ -2211,6 +2212,10 @@ static int kvm_apic_state_fixup(struct kvm_vcpu *vcpu, else *id <<= 24; } + + /* In x2APIC mode, the LDR is fixed and based on the id */ + if (set) + *ldr = kvm_apic_calc_x2apic_ldr(*id); } return 0; From 4162b2973c1c6e49d7b193e572dec691233115c3 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Mon, 27 Nov 2017 11:28:50 +0100 Subject: [PATCH 0042/1013] mmc: sdhci: Avoid swiotlb buffer being full commit 250dcd11466e06df64b92520e2c56bdae453581b upstream. The commit de3ee99b097d ("mmc: Delete bounce buffer handling") deletes the bounce buffer handling, but also causes the max_req_size for sdhci to be increased, in case when max_segs == 1. This causes errors for sdhci-pci Ricoh variant, about the swiotlb buffer to become full. Fix the issue, by taking IO_TLB_SEGSIZE and IO_TLB_SHIFT into account when deciding the max_req_size for sdhci. Reported-by: Jiri Slaby Fixes: de3ee99b097d ("mmc: Delete bounce buffer handling") Signed-off-by: Ulf Hansson Tested-by: Jiri Slaby Acked-by: Adrian Hunter Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 0d5fcca18c9e..6152e83ff935 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include #include @@ -3650,23 +3651,30 @@ int sdhci_setup_host(struct sdhci_host *host) spin_lock_init(&host->lock); + /* + * Maximum number of sectors in one transfer. Limited by SDMA boundary + * size (512KiB). Note some tuning modes impose a 4MiB limit, but this + * is less anyway. + */ + mmc->max_req_size = 524288; + /* * Maximum number of segments. Depends on if the hardware * can do scatter/gather or not. */ - if (host->flags & SDHCI_USE_ADMA) + if (host->flags & SDHCI_USE_ADMA) { mmc->max_segs = SDHCI_MAX_SEGS; - else if (host->flags & SDHCI_USE_SDMA) + } else if (host->flags & SDHCI_USE_SDMA) { mmc->max_segs = 1; - else /* PIO */ + if (swiotlb_max_segment()) { + unsigned int max_req_size = (1 << IO_TLB_SHIFT) * + IO_TLB_SEGSIZE; + mmc->max_req_size = min(mmc->max_req_size, + max_req_size); + } + } else { /* PIO */ mmc->max_segs = SDHCI_MAX_SEGS; - - /* - * Maximum number of sectors in one transfer. Limited by SDMA boundary - * size (512KiB). Note some tuning modes impose a 4MiB limit, but this - * is less anyway. - */ - mmc->max_req_size = 524288; + } /* * Maximum segment size. Could be one segment with the maximum number From 0d8d2b6373c19f85bab6af1702563b50e17b2cf6 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 21 Nov 2017 15:42:27 +0200 Subject: [PATCH 0043/1013] mmc: block: Fix missing blk_put_request() commit 34c089e806793a66e450b11bd167db6047399fcd upstream. Ensure blk_get_request() is paired with blk_put_request(). Fixes: 0493f6fe5bde ("mmc: block: Move boot partition locking into a driver op") Fixes: 627c3ccfb46a ("mmc: debugfs: Move block debugfs into block module") Signed-off-by: Adrian Hunter Reviewed-by: Linus Walleij Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/block.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index 2ad7b5c69156..e17994318562 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -207,6 +207,7 @@ static ssize_t power_ro_lock_store(struct device *dev, req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_BOOT_WP; blk_execute_rq(mq->queue, NULL, req, 0); ret = req_to_mmc_queue_req(req)->drv_op_result; + blk_put_request(req); if (!ret) { pr_info("%s: Locking boot partition ro until next power on\n", @@ -2321,6 +2322,7 @@ static int mmc_dbg_card_status_get(void *data, u64 *val) *val = ret; ret = 0; } + blk_put_request(req); return ret; } @@ -2351,6 +2353,7 @@ static int mmc_ext_csd_open(struct inode *inode, struct file *filp) req_to_mmc_queue_req(req)->drv_op_data = &ext_csd; blk_execute_rq(mq->queue, NULL, req, 0); err = req_to_mmc_queue_req(req)->drv_op_result; + blk_put_request(req); if (err) { pr_err("FAILED %d\n", err); goto out_free; From 957c27d45611ad4ef924840346bc338b8f3c870b Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 21 Nov 2017 15:42:28 +0200 Subject: [PATCH 0044/1013] mmc: block: Check return value of blk_get_request() commit fb8e456e547ed2c699f64665bd8a3b9bde7b9728 upstream. blk_get_request() can fail, always check the return value. Fixes: 0493f6fe5bde ("mmc: block: Move boot partition locking into a driver op") Fixes: 3ecd8cf23f88 ("mmc: block: move multi-ioctl() to use block layer") Fixes: 614f0388f580 ("mmc: block: move single ioctl() commands to block requests") Fixes: 627c3ccfb46a ("mmc: debugfs: Move block debugfs into block module") Signed-off-by: Adrian Hunter Reviewed-by: Linus Walleij Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/block.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index e17994318562..a20e29abb374 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -204,6 +204,10 @@ static ssize_t power_ro_lock_store(struct device *dev, /* Dispatch locking to the block layer */ req = blk_get_request(mq->queue, REQ_OP_DRV_OUT, __GFP_RECLAIM); + if (IS_ERR(req)) { + count = PTR_ERR(req); + goto out_put; + } req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_BOOT_WP; blk_execute_rq(mq->queue, NULL, req, 0); ret = req_to_mmc_queue_req(req)->drv_op_result; @@ -220,7 +224,7 @@ static ssize_t power_ro_lock_store(struct device *dev, set_disk_ro(part_md->disk, 1); } } - +out_put: mmc_blk_put(md); return count; } @@ -581,6 +585,10 @@ static int mmc_blk_ioctl_cmd(struct mmc_blk_data *md, req = blk_get_request(mq->queue, idata->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN, __GFP_RECLAIM); + if (IS_ERR(req)) { + err = PTR_ERR(req); + goto cmd_done; + } idatas[0] = idata; req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_IOCTL; req_to_mmc_queue_req(req)->drv_op_data = idatas; @@ -644,6 +652,10 @@ static int mmc_blk_ioctl_multi_cmd(struct mmc_blk_data *md, req = blk_get_request(mq->queue, idata[0]->ic.write_flag ? REQ_OP_DRV_OUT : REQ_OP_DRV_IN, __GFP_RECLAIM); + if (IS_ERR(req)) { + err = PTR_ERR(req); + goto cmd_err; + } req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_IOCTL; req_to_mmc_queue_req(req)->drv_op_data = idata; req_to_mmc_queue_req(req)->ioc_count = num_of_cmds; @@ -2315,6 +2327,8 @@ static int mmc_dbg_card_status_get(void *data, u64 *val) /* Ask the block layer about the card status */ req = blk_get_request(mq->queue, REQ_OP_DRV_IN, __GFP_RECLAIM); + if (IS_ERR(req)) + return PTR_ERR(req); req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_GET_CARD_STATUS; blk_execute_rq(mq->queue, NULL, req, 0); ret = req_to_mmc_queue_req(req)->drv_op_result; @@ -2349,6 +2363,10 @@ static int mmc_ext_csd_open(struct inode *inode, struct file *filp) /* Ask the block layer for the EXT CSD */ req = blk_get_request(mq->queue, REQ_OP_DRV_IN, __GFP_RECLAIM); + if (IS_ERR(req)) { + err = PTR_ERR(req); + goto out_free; + } req_to_mmc_queue_req(req)->drv_op = MMC_DRV_OP_GET_EXT_CSD; req_to_mmc_queue_req(req)->drv_op_data = &ext_csd; blk_execute_rq(mq->queue, NULL, req, 0); From 94497458fbb519dbc230e5156f58d4b3a49ba0c1 Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 21 Nov 2017 15:42:29 +0200 Subject: [PATCH 0045/1013] mmc: core: Do not leave the block driver in a suspended state commit ebe7dd45cf49e3b49cacbaace17f9f878f21fbea upstream. The block driver must be resumed if the mmc bus fails to suspend the card. Signed-off-by: Adrian Hunter Reviewed-by: Linus Walleij Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/bus.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/mmc/core/bus.c b/drivers/mmc/core/bus.c index 301246513a37..7f428e387de3 100644 --- a/drivers/mmc/core/bus.c +++ b/drivers/mmc/core/bus.c @@ -157,6 +157,9 @@ static int mmc_bus_suspend(struct device *dev) return ret; ret = host->bus_ops->suspend(host); + if (ret) + pm_generic_resume(dev); + return ret; } From 64d1df6f018e728774bdf5d1aeb576d1c8d1872d Mon Sep 17 00:00:00 2001 From: Adrian Hunter Date: Tue, 21 Nov 2017 15:42:30 +0200 Subject: [PATCH 0046/1013] mmc: block: Ensure that debugfs files are removed commit f9f0da98819503b06b35e61869d18cf3a8cd3323 upstream. The card is not necessarily being removed, but the debugfs files must be removed when the driver is removed, otherwise they will continue to exist after unbinding the card from the driver. e.g. # echo "mmc1:0001" > /sys/bus/mmc/drivers/mmcblk/unbind # cat /sys/kernel/debug/mmc1/mmc1\:0001/ext_csd [ 173.634584] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050 [ 173.643356] IP: mmc_ext_csd_open+0x5e/0x170 A complication is that the debugfs_root may have already been removed, so check for that too. Fixes: 627c3ccfb46a ("mmc: debugfs: Move block debugfs into block module") Signed-off-by: Adrian Hunter Reviewed-by: Linus Walleij Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/block.c | 44 ++++++++++++++++++++++++++++++++------ drivers/mmc/core/debugfs.c | 1 + 2 files changed, 38 insertions(+), 7 deletions(-) diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index a20e29abb374..ccb516f18d72 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -119,6 +119,10 @@ struct mmc_blk_data { struct device_attribute force_ro; struct device_attribute power_ro_lock; int area_type; + + /* debugfs files (only in main mmc_blk_data) */ + struct dentry *status_dentry; + struct dentry *ext_csd_dentry; }; static DEFINE_MUTEX(open_lock); @@ -2417,7 +2421,7 @@ static const struct file_operations mmc_dbg_ext_csd_fops = { .llseek = default_llseek, }; -static int mmc_blk_add_debugfs(struct mmc_card *card) +static int mmc_blk_add_debugfs(struct mmc_card *card, struct mmc_blk_data *md) { struct dentry *root; @@ -2427,28 +2431,53 @@ static int mmc_blk_add_debugfs(struct mmc_card *card) root = card->debugfs_root; if (mmc_card_mmc(card) || mmc_card_sd(card)) { - if (!debugfs_create_file("status", S_IRUSR, root, card, - &mmc_dbg_card_status_fops)) + md->status_dentry = + debugfs_create_file("status", S_IRUSR, root, card, + &mmc_dbg_card_status_fops); + if (!md->status_dentry) return -EIO; } if (mmc_card_mmc(card)) { - if (!debugfs_create_file("ext_csd", S_IRUSR, root, card, - &mmc_dbg_ext_csd_fops)) + md->ext_csd_dentry = + debugfs_create_file("ext_csd", S_IRUSR, root, card, + &mmc_dbg_ext_csd_fops); + if (!md->ext_csd_dentry) return -EIO; } return 0; } +static void mmc_blk_remove_debugfs(struct mmc_card *card, + struct mmc_blk_data *md) +{ + if (!card->debugfs_root) + return; + + if (!IS_ERR_OR_NULL(md->status_dentry)) { + debugfs_remove(md->status_dentry); + md->status_dentry = NULL; + } + + if (!IS_ERR_OR_NULL(md->ext_csd_dentry)) { + debugfs_remove(md->ext_csd_dentry); + md->ext_csd_dentry = NULL; + } +} #else -static int mmc_blk_add_debugfs(struct mmc_card *card) +static int mmc_blk_add_debugfs(struct mmc_card *card, struct mmc_blk_data *md) { return 0; } +static void mmc_blk_remove_debugfs(struct mmc_card *card, + struct mmc_blk_data *md) +{ +} + #endif /* CONFIG_DEBUG_FS */ static int mmc_blk_probe(struct mmc_card *card) @@ -2488,7 +2517,7 @@ static int mmc_blk_probe(struct mmc_card *card) } /* Add two debugfs entries */ - mmc_blk_add_debugfs(card); + mmc_blk_add_debugfs(card, md); pm_runtime_set_autosuspend_delay(&card->dev, 3000); pm_runtime_use_autosuspend(&card->dev); @@ -2514,6 +2543,7 @@ static void mmc_blk_remove(struct mmc_card *card) { struct mmc_blk_data *md = dev_get_drvdata(&card->dev); + mmc_blk_remove_debugfs(card, md); mmc_blk_remove_parts(card, md); pm_runtime_get_sync(&card->dev); mmc_claim_host(card->host); diff --git a/drivers/mmc/core/debugfs.c b/drivers/mmc/core/debugfs.c index 01e459a34f33..0f4a7d7b2626 100644 --- a/drivers/mmc/core/debugfs.c +++ b/drivers/mmc/core/debugfs.c @@ -314,4 +314,5 @@ void mmc_add_card_debugfs(struct mmc_card *card) void mmc_remove_card_debugfs(struct mmc_card *card) { debugfs_remove_recursive(card->debugfs_root); + card->debugfs_root = NULL; } From e6aa6389263d0db52ea674ed45eb591d21585eca Mon Sep 17 00:00:00 2001 From: Bastian Stender Date: Tue, 28 Nov 2017 09:24:06 +0100 Subject: [PATCH 0047/1013] mmc: core: prepend 0x to pre_eol_info entry in sysfs commit 80a780a167d9267c72867b806142bd6ec69ba123 upstream. The sysfs entry "pre_eol_info" was missing the 0x prefix to identify it as hex formatted. Fixes: 46bc5c408e4e ("mmc: core: Export device lifetime information through sysfs") Signed-off-by: Bastian Stender Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/mmc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 36217ad5e9b1..b935ad895701 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -780,7 +780,7 @@ MMC_DEV_ATTR(manfid, "0x%06x\n", card->cid.manfid); MMC_DEV_ATTR(name, "%s\n", card->cid.prod_name); MMC_DEV_ATTR(oemid, "0x%04x\n", card->cid.oemid); MMC_DEV_ATTR(prv, "0x%x\n", card->cid.prv); -MMC_DEV_ATTR(pre_eol_info, "%02x\n", card->ext_csd.pre_eol_info); +MMC_DEV_ATTR(pre_eol_info, "0x%02x\n", card->ext_csd.pre_eol_info); MMC_DEV_ATTR(life_time, "0x%02x 0x%02x\n", card->ext_csd.device_life_time_est_typ_a, card->ext_csd.device_life_time_est_typ_b); From 1ef1173c4864c06b6104f386c2694aa740cf02ee Mon Sep 17 00:00:00 2001 From: Bastian Stender Date: Tue, 28 Nov 2017 09:24:07 +0100 Subject: [PATCH 0048/1013] mmc: core: prepend 0x to OCR entry in sysfs commit c892b0d81705c566f575e489efc3c50762db1bde upstream. The sysfs entry "ocr" was missing the 0x prefix to identify it as hex formatted. Fixes: 5fb06af7a33b ("mmc: core: Extend sysfs with OCR register") Signed-off-by: Bastian Stender [Ulf: Amended change to also cover SD-cards] Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/mmc.c | 2 +- drivers/mmc/core/sd.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index b935ad895701..bad5c1bf4ed9 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -790,7 +790,7 @@ MMC_DEV_ATTR(enhanced_area_offset, "%llu\n", MMC_DEV_ATTR(enhanced_area_size, "%u\n", card->ext_csd.enhanced_area_size); MMC_DEV_ATTR(raw_rpmb_size_mult, "%#x\n", card->ext_csd.raw_rpmb_size_mult); MMC_DEV_ATTR(rel_sectors, "%#x\n", card->ext_csd.rel_sectors); -MMC_DEV_ATTR(ocr, "%08x\n", card->ocr); +MMC_DEV_ATTR(ocr, "0x%08x\n", card->ocr); MMC_DEV_ATTR(cmdq_en, "%d\n", card->ext_csd.cmdq_en); static ssize_t mmc_fwrev_show(struct device *dev, diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c index 4fd1620b732d..eb9de2134967 100644 --- a/drivers/mmc/core/sd.c +++ b/drivers/mmc/core/sd.c @@ -675,7 +675,7 @@ MMC_DEV_ATTR(manfid, "0x%06x\n", card->cid.manfid); MMC_DEV_ATTR(name, "%s\n", card->cid.prod_name); MMC_DEV_ATTR(oemid, "0x%04x\n", card->cid.oemid); MMC_DEV_ATTR(serial, "0x%08x\n", card->cid.serial); -MMC_DEV_ATTR(ocr, "%08x\n", card->ocr); +MMC_DEV_ATTR(ocr, "0x%08x\n", card->ocr); static ssize_t mmc_dsr_show(struct device *dev, From 450d75882552d59bb03b1ccb0999e98f5b227532 Mon Sep 17 00:00:00 2001 From: Lv Zheng Date: Tue, 26 Sep 2017 16:54:09 +0800 Subject: [PATCH 0049/1013] ACPI / EC: Fix regression related to PM ops support in ECDT device commit a64a62ce9a380213dc9e192f762266d70c9b40ec upstream. On platforms (ASUS X550ZE and possibly all ASUS X series) with valid ECDT EC but invalid DSDT EC, EC PM ops won't be invoked as ECDT EC is not an ACPI device. Thus the following commit actually removed post-resume acpi_ec_enable_event() invocation for such platforms, and triggered a regression on them that after being resumed, EC (actually should be ECDT) driver stops handling EC events: Commit: c2b46d679b30c5c0d7eb47a21085943242bdd8dc Subject: ACPI / EC: Add PM operations to improve event handling for resume process Notice that the root cause actually is "ECDT is not an ACPI device" rather than "the timing of acpi_ec_enable_event() invocation", this patch fixes this issue by enumerating ECDT EC as an ACPI device. Due to the existence of the noirq stage, the ability of tuning the timing of acpi_ec_enable_event() invocation is still meaningful. This patch is a little bit different from the posted fix by moving acpi_config_boot_ec() from acpi_ec_ecdt_start() to acpi_ec_add() to make sure that EC event handling won't be stopped as long as the ACPI EC driver is bound. Thus the following sequence shouldn't disable EC event handling: unbind,suspend,resume,bind. Fixes: c2b46d679b30 (ACPI / EC: Add PM operations to improve event handling for resume process) Link: https://bugzilla.kernel.org/show_bug.cgi?id=196847 Reported-by: Luya Tshimbalanga Tested-by: Luya Tshimbalanga Signed-off-by: Lv Zheng Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/ec.c | 69 ++++++++++++++++++++++++------------- drivers/acpi/internal.h | 1 + drivers/acpi/scan.c | 21 +++++++++++ include/acpi/acpi_bus.h | 1 + include/acpi/acpi_drivers.h | 1 + 5 files changed, 69 insertions(+), 24 deletions(-) diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c index 82b3ce5e937e..df842465634a 100644 --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -1597,32 +1597,41 @@ static int acpi_ec_add(struct acpi_device *device) { struct acpi_ec *ec = NULL; int ret; + bool is_ecdt = false; + acpi_status status; strcpy(acpi_device_name(device), ACPI_EC_DEVICE_NAME); strcpy(acpi_device_class(device), ACPI_EC_CLASS); - ec = acpi_ec_alloc(); - if (!ec) - return -ENOMEM; - if (ec_parse_device(device->handle, 0, ec, NULL) != - AE_CTRL_TERMINATE) { + if (!strcmp(acpi_device_hid(device), ACPI_ECDT_HID)) { + is_ecdt = true; + ec = boot_ec; + } else { + ec = acpi_ec_alloc(); + if (!ec) + return -ENOMEM; + status = ec_parse_device(device->handle, 0, ec, NULL); + if (status != AE_CTRL_TERMINATE) { ret = -EINVAL; goto err_alloc; + } } if (acpi_is_boot_ec(ec)) { - boot_ec_is_ecdt = false; - /* - * Trust PNP0C09 namespace location rather than ECDT ID. - * - * But trust ECDT GPE rather than _GPE because of ASUS quirks, - * so do not change boot_ec->gpe to ec->gpe. - */ - boot_ec->handle = ec->handle; - acpi_handle_debug(ec->handle, "duplicated.\n"); - acpi_ec_free(ec); - ec = boot_ec; - ret = acpi_config_boot_ec(ec, ec->handle, true, false); + boot_ec_is_ecdt = is_ecdt; + if (!is_ecdt) { + /* + * Trust PNP0C09 namespace location rather than + * ECDT ID. But trust ECDT GPE rather than _GPE + * because of ASUS quirks, so do not change + * boot_ec->gpe to ec->gpe. + */ + boot_ec->handle = ec->handle; + acpi_handle_debug(ec->handle, "duplicated.\n"); + acpi_ec_free(ec); + ec = boot_ec; + } + ret = acpi_config_boot_ec(ec, ec->handle, true, is_ecdt); } else ret = acpi_ec_setup(ec, true); if (ret) @@ -1635,8 +1644,10 @@ static int acpi_ec_add(struct acpi_device *device) ret = !!request_region(ec->command_addr, 1, "EC cmd"); WARN(!ret, "Could not request EC cmd io port 0x%lx", ec->command_addr); - /* Reprobe devices depending on the EC */ - acpi_walk_dep_device_list(ec->handle); + if (!is_ecdt) { + /* Reprobe devices depending on the EC */ + acpi_walk_dep_device_list(ec->handle); + } acpi_handle_debug(ec->handle, "enumerated.\n"); return 0; @@ -1692,6 +1703,7 @@ ec_parse_io_ports(struct acpi_resource *resource, void *context) static const struct acpi_device_id ec_device_ids[] = { {"PNP0C09", 0}, + {ACPI_ECDT_HID, 0}, {"", 0}, }; @@ -1764,11 +1776,14 @@ static int __init acpi_ec_ecdt_start(void) * Note: ec->handle can be valid if this function is called after * acpi_ec_add(), hence the fast path. */ - if (boot_ec->handle != ACPI_ROOT_OBJECT) - handle = boot_ec->handle; - else if (!acpi_ec_ecdt_get_handle(&handle)) - return -ENODEV; - return acpi_config_boot_ec(boot_ec, handle, true, true); + if (boot_ec->handle == ACPI_ROOT_OBJECT) { + if (!acpi_ec_ecdt_get_handle(&handle)) + return -ENODEV; + boot_ec->handle = handle; + } + + /* Register to ACPI bus with PM ops attached */ + return acpi_bus_register_early_device(ACPI_BUS_TYPE_ECDT_EC); } #if 0 @@ -2020,6 +2035,12 @@ int __init acpi_ec_init(void) /* Drivers must be started after acpi_ec_query_init() */ dsdt_fail = acpi_bus_register_driver(&acpi_ec_driver); + /* + * Register ECDT to ACPI bus only when PNP0C09 probe fails. This is + * useful for platforms (confirmed on ASUS X550ZE) with valid ECDT + * settings but invalid DSDT settings. + * https://bugzilla.kernel.org/show_bug.cgi?id=196847 + */ ecdt_fail = acpi_ec_ecdt_start(); return ecdt_fail && dsdt_fail ? -ENODEV : 0; } diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h index 4361c4415b4f..ede83d38beed 100644 --- a/drivers/acpi/internal.h +++ b/drivers/acpi/internal.h @@ -115,6 +115,7 @@ bool acpi_device_is_present(const struct acpi_device *adev); bool acpi_device_is_battery(struct acpi_device *adev); bool acpi_device_is_first_physical_node(struct acpi_device *adev, const struct device *dev); +int acpi_bus_register_early_device(int type); /* -------------------------------------------------------------------------- Device Matching and Notification diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c index 602f8ff212f2..2f2f50322ffb 100644 --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -1024,6 +1024,9 @@ static void acpi_device_get_busid(struct acpi_device *device) case ACPI_BUS_TYPE_SLEEP_BUTTON: strcpy(device->pnp.bus_id, "SLPF"); break; + case ACPI_BUS_TYPE_ECDT_EC: + strcpy(device->pnp.bus_id, "ECDT"); + break; default: acpi_get_name(device->handle, ACPI_SINGLE_NAME, &buffer); /* Clean up trailing underscores (if any) */ @@ -1304,6 +1307,9 @@ static void acpi_set_pnp_ids(acpi_handle handle, struct acpi_device_pnp *pnp, case ACPI_BUS_TYPE_SLEEP_BUTTON: acpi_add_id(pnp, ACPI_BUTTON_HID_SLEEPF); break; + case ACPI_BUS_TYPE_ECDT_EC: + acpi_add_id(pnp, ACPI_ECDT_HID); + break; } } @@ -2049,6 +2055,21 @@ void acpi_bus_trim(struct acpi_device *adev) } EXPORT_SYMBOL_GPL(acpi_bus_trim); +int acpi_bus_register_early_device(int type) +{ + struct acpi_device *device = NULL; + int result; + + result = acpi_add_single_object(&device, NULL, + type, ACPI_STA_DEFAULT); + if (result) + return result; + + device->flags.match_driver = true; + return device_attach(&device->dev); +} +EXPORT_SYMBOL_GPL(acpi_bus_register_early_device); + static int acpi_bus_scan_fixed(void) { int result = 0; diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index fa1505292f6c..324a04df3785 100644 --- a/include/acpi/acpi_bus.h +++ b/include/acpi/acpi_bus.h @@ -105,6 +105,7 @@ enum acpi_bus_device_type { ACPI_BUS_TYPE_THERMAL, ACPI_BUS_TYPE_POWER_BUTTON, ACPI_BUS_TYPE_SLEEP_BUTTON, + ACPI_BUS_TYPE_ECDT_EC, ACPI_BUS_DEVICE_TYPE_COUNT }; diff --git a/include/acpi/acpi_drivers.h b/include/acpi/acpi_drivers.h index 29c691265b49..14499757338f 100644 --- a/include/acpi/acpi_drivers.h +++ b/include/acpi/acpi_drivers.h @@ -58,6 +58,7 @@ #define ACPI_VIDEO_HID "LNXVIDEO" #define ACPI_BAY_HID "LNXIOBAY" #define ACPI_DOCK_HID "LNXDOCK" +#define ACPI_ECDT_HID "LNXEC" /* Quirk for broken IBM BIOSes */ #define ACPI_SMBUS_IBM_HID "SMBUSIBM" From 1ad53c57966a1a295c0ea5a8a7d6e85d3d75253a Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Mon, 27 Nov 2017 20:46:22 +0100 Subject: [PATCH 0050/1013] eeprom: at24: fix reading from 24MAC402/24MAC602 commit 644a1f19c6c8393d0c4168a5adf79056da6822eb upstream. Chip datasheet mentions that word addresses other than the actual start position of the MAC delivers undefined results. So fix this. Current implementation doesn't work due to this wrong offset. Fixes: 0b813658c115 ("eeprom: at24: add support for at24mac series") Signed-off-by: Heiner Kallweit Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/misc/eeprom/at24.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/misc/eeprom/at24.c b/drivers/misc/eeprom/at24.c index 764ff5df0dbc..bfd1a6f4ebf6 100644 --- a/drivers/misc/eeprom/at24.c +++ b/drivers/misc/eeprom/at24.c @@ -365,7 +365,8 @@ static ssize_t at24_eeprom_read_mac(struct at24_data *at24, char *buf, memset(msg, 0, sizeof(msg)); msg[0].addr = client->addr; msg[0].buf = addrbuf; - addrbuf[0] = 0x90 + offset; + /* EUI-48 starts from 0x9a, EUI-64 from 0x98 */ + addrbuf[0] = 0xa0 - at24->chip.byte_len + offset; msg[0].len = 1; msg[1].addr = client->addr; msg[1].flags = I2C_M_RD; From 656155d07658e84ac6ab6f6eceac67b807d6fedb Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Mon, 27 Nov 2017 22:06:13 +0100 Subject: [PATCH 0051/1013] eeprom: at24: correctly set the size for at24mac402 commit 5478e478eee3b096b8d998d4ed445da30da2dfbc upstream. There's an ilog2() expansion in AT24_DEVICE_MAGIC() which rounds down the actual size of EUI-48 byte array in at24mac402 eeproms to 4 from 6, making it impossible to read it all. Fix it by manually adjusting the value in probe(). This patch contains a temporary fix that is suitable for stable branches. Eventually we'll probably remove the call to ilog2() while converting the magic values to actual structs. Fixes: 0b813658c115 ("eeprom: at24: add support for at24mac series") Signed-off-by: Bartosz Golaszewski Reviewed-by: Andy Shevchenko Signed-off-by: Greg Kroah-Hartman --- drivers/misc/eeprom/at24.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/drivers/misc/eeprom/at24.c b/drivers/misc/eeprom/at24.c index bfd1a6f4ebf6..e0d018e8bb8e 100644 --- a/drivers/misc/eeprom/at24.c +++ b/drivers/misc/eeprom/at24.c @@ -632,6 +632,16 @@ static int at24_probe(struct i2c_client *client, const struct i2c_device_id *id) dev_warn(&client->dev, "page_size looks suspicious (no power of 2)!\n"); + /* + * REVISIT: the size of the EUI-48 byte array is 6 in at24mac402, while + * the call to ilog2() in AT24_DEVICE_MAGIC() rounds it down to 4. + * + * Eventually we'll get rid of the magic values altoghether in favor of + * real structs, but for now just manually set the right size. + */ + if (chip.flags & AT24_FLAG_MAC && chip.byte_len == 4) + chip.byte_len = 6; + /* Use I2C operations unless we're stuck with SMBus extensions. */ if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) { if (chip.flags & AT24_FLAG_ADDR16) From 52d9bb958f080f0a548139676dbf6a87a40af2a0 Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Fri, 24 Nov 2017 07:47:50 +0100 Subject: [PATCH 0052/1013] eeprom: at24: check at24_read/write arguments commit d9bcd462daf34aebb8de9ad7f76de0198bb5a0f0 upstream. So far we completely rely on the caller to provide valid arguments. To be on the safe side perform an own sanity check. Signed-off-by: Heiner Kallweit Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/misc/eeprom/at24.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/misc/eeprom/at24.c b/drivers/misc/eeprom/at24.c index e0d018e8bb8e..372b2060fbba 100644 --- a/drivers/misc/eeprom/at24.c +++ b/drivers/misc/eeprom/at24.c @@ -507,6 +507,9 @@ static int at24_read(void *priv, unsigned int off, void *val, size_t count) if (unlikely(!count)) return count; + if (off + count > at24->chip.byte_len) + return -EINVAL; + /* * Read data from chip, protecting against concurrent updates * from this host, but not from other I2C masters. @@ -539,6 +542,9 @@ static int at24_write(void *priv, unsigned int off, void *val, size_t count) if (unlikely(!count)) return -EINVAL; + if (off + count > at24->chip.byte_len) + return -EINVAL; + /* * Write data to chip, protecting against concurrent updates * from this host, but not from other I2C masters. From 9c71f896bb8f3b7bb228ece70341b26b2ff32ae9 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 22 Nov 2017 12:28:17 +0100 Subject: [PATCH 0053/1013] i2c: i801: Fix Failed to allocate irq -2147483648 error commit 6e0c9507bf51e1517a80ad0ac171e5402528fcef upstream. On Apollo Lake devices the BIOS does not set up IRQ routing for the i801 SMBUS controller IRQ, so we end up with dev->irq set to IRQ_NOTCONNECTED. Detect this and do not try to use the irq in this case silencing: i801_smbus 0000:00:1f.1: Failed to allocate irq -2147483648: -107 BugLink: https://communities.intel.com/thread/114759 Signed-off-by: Hans de Goede Reviewed-by: Jean Delvare Signed-off-by: Wolfram Sang Signed-off-by: Greg Kroah-Hartman --- drivers/i2c/busses/i2c-i801.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/i2c/busses/i2c-i801.c b/drivers/i2c/busses/i2c-i801.c index 9e12a53ef7b8..8eac00efadc1 100644 --- a/drivers/i2c/busses/i2c-i801.c +++ b/drivers/i2c/busses/i2c-i801.c @@ -1617,6 +1617,9 @@ static int i801_probe(struct pci_dev *dev, const struct pci_device_id *id) /* Default timeout in interrupt mode: 200 ms */ priv->adapter.timeout = HZ / 5; + if (dev->irq == IRQ_NOTCONNECTED) + priv->features &= ~FEATURE_IRQ; + if (priv->features & FEATURE_IRQ) { u16 pcictl, pcists; From 6e8aca2095790b90217b759a0d5e2ec5138bbfd2 Mon Sep 17 00:00:00 2001 From: Vaibhav Jain Date: Thu, 23 Nov 2017 09:08:57 +0530 Subject: [PATCH 0054/1013] cxl: Check if vphb exists before iterating over AFU devices commit 12841f87b7a8ceb3d54f171660f72a86941bfcb3 upstream. During an eeh a kernel-oops is reported if no vPHB is allocated to the AFU. This happens as during AFU init, an error in creation of vPHB is a non-fatal error. Hence afu->phb should always be checked for NULL before iterating over it for the virtual AFU pci devices. This patch fixes the kenel-oops by adding a NULL pointer check for afu->phb before it is dereferenced. Fixes: 9e8df8a21963 ("cxl: EEH support") Signed-off-by: Vaibhav Jain Acked-by: Andrew Donnellan Acked-by: Frederic Barrat Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- drivers/misc/cxl/pci.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/misc/cxl/pci.c b/drivers/misc/cxl/pci.c index 3ba04f371380..81093f8157a9 100644 --- a/drivers/misc/cxl/pci.c +++ b/drivers/misc/cxl/pci.c @@ -2043,6 +2043,9 @@ static pci_ers_result_t cxl_vphb_error_detected(struct cxl_afu *afu, /* There should only be one entry, but go through the list * anyway */ + if (afu->phb == NULL) + return result; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { if (!afu_dev->driver) continue; @@ -2084,8 +2087,7 @@ static pci_ers_result_t cxl_pci_error_detected(struct pci_dev *pdev, * Tell the AFU drivers; but we don't care what they * say, we're going away. */ - if (afu->phb != NULL) - cxl_vphb_error_detected(afu, state); + cxl_vphb_error_detected(afu, state); } return PCI_ERS_RESULT_DISCONNECT; } @@ -2225,6 +2227,9 @@ static pci_ers_result_t cxl_pci_slot_reset(struct pci_dev *pdev) if (cxl_afu_select_best_mode(afu)) goto err; + if (afu->phb == NULL) + continue; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { /* Reset the device context. * TODO: make this less disruptive @@ -2287,6 +2292,9 @@ static void cxl_pci_resume(struct pci_dev *pdev) for (i = 0; i < adapter->slices; i++) { afu = adapter->afu[i]; + if (afu->phb == NULL) + continue; + list_for_each_entry(afu_dev, &afu->phb->bus->devices, bus_list) { if (afu_dev->driver && afu_dev->driver->err_handler && afu_dev->driver->err_handler->resume) From f5b41b366273ff51c7a62643f8667784eb349fba Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Fri, 24 Nov 2017 15:14:25 -0800 Subject: [PATCH 0055/1013] bcache: Fix building error on MIPS commit cf33c1ee5254c6a430bc1538232b49c3ea13e613 upstream. This patch try to fix the building error on MIPS. The reason is MIPS has already defined the PTR macro, which conflicts with the PTR macro in include/uapi/linux/bcache.h. [fixed by mlyle: corrected a line-length issue] Signed-off-by: Huacai Chen Reviewed-by: Michael Lyle Signed-off-by: Michael Lyle Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/md/bcache/alloc.c | 2 +- drivers/md/bcache/extents.c | 2 +- drivers/md/bcache/journal.c | 2 +- include/uapi/linux/bcache.h | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/md/bcache/alloc.c b/drivers/md/bcache/alloc.c index c9934139d609..934b1fce4ce1 100644 --- a/drivers/md/bcache/alloc.c +++ b/drivers/md/bcache/alloc.c @@ -480,7 +480,7 @@ int __bch_bucket_alloc_set(struct cache_set *c, unsigned reserve, if (b == -1) goto err; - k->ptr[i] = PTR(ca->buckets[b].gen, + k->ptr[i] = MAKE_PTR(ca->buckets[b].gen, bucket_to_sector(c, b), ca->sb.nr_this_dev); diff --git a/drivers/md/bcache/extents.c b/drivers/md/bcache/extents.c index 41c238fc3733..f9d391711595 100644 --- a/drivers/md/bcache/extents.c +++ b/drivers/md/bcache/extents.c @@ -585,7 +585,7 @@ static bool bch_extent_merge(struct btree_keys *bk, struct bkey *l, struct bkey return false; for (i = 0; i < KEY_PTRS(l); i++) - if (l->ptr[i] + PTR(0, KEY_SIZE(l), 0) != r->ptr[i] || + if (l->ptr[i] + MAKE_PTR(0, KEY_SIZE(l), 0) != r->ptr[i] || PTR_BUCKET_NR(b->c, l, i) != PTR_BUCKET_NR(b->c, r, i)) return false; diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c index 02a98ddb592d..03cc0722ae48 100644 --- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -507,7 +507,7 @@ static void journal_reclaim(struct cache_set *c) continue; ja->cur_idx = next; - k->ptr[n++] = PTR(0, + k->ptr[n++] = MAKE_PTR(0, bucket_to_sector(c, ca->sb.d[ja->cur_idx]), ca->sb.nr_this_dev); } diff --git a/include/uapi/linux/bcache.h b/include/uapi/linux/bcache.h index 90fc490f973f..821f71a2e48f 100644 --- a/include/uapi/linux/bcache.h +++ b/include/uapi/linux/bcache.h @@ -91,7 +91,7 @@ PTR_FIELD(PTR_GEN, 0, 8) #define PTR_CHECK_DEV ((1 << PTR_DEV_BITS) - 1) -#define PTR(gen, offset, dev) \ +#define MAKE_PTR(gen, offset, dev) \ ((((__u64) dev) << 51) | ((__u64) offset) << 8 | gen) /* Bkey utility code */ From 260c6625c85e7e1a5db0b5ece9944a4010c9bade Mon Sep 17 00:00:00 2001 From: Coly Li Date: Mon, 30 Oct 2017 14:46:31 -0700 Subject: [PATCH 0056/1013] bcache: only permit to recovery read error when cache device is clean commit d59b23795933678c9638fd20c942d2b4f3cd6185 upstream. When bcache does read I/Os, for example in writeback or writethrough mode, if a read request on cache device is failed, bcache will try to recovery the request by reading from cached device. If the data on cached device is not synced with cache device, then requester will get a stale data. For critical storage system like database, providing stale data from recovery may result an application level data corruption, which is unacceptible. With this patch, for a failed read request in writeback or writethrough mode, recovery a recoverable read request only happens when cache device is clean. That is to say, all data on cached device is up to update. For other cache modes in bcache, read request will never hit cached_dev_read_error(), they don't need this patch. Please note, because cache mode can be switched arbitrarily in run time, a writethrough mode might be switched from a writeback mode. Therefore checking dc->has_data in writethrough mode still makes sense. Changelog: V4: Fix parens error pointed by Michael Lyle. v3: By response from Kent Oversteet, he thinks recovering stale data is a bug to fix, and option to permit it is unnecessary. So this version the sysfs file is removed. v2: rename sysfs entry from allow_stale_data_on_failure to allow_stale_data_on_failure, and fix the confusing commit log. v1: initial patch posted. [small change to patch comment spelling by mlyle] Signed-off-by: Coly Li Signed-off-by: Michael Lyle Reported-by: Arne Wolf Reviewed-by: Michael Lyle Cc: Kent Overstreet Cc: Nix Cc: Kai Krakow Cc: Eric Wheeler Cc: Junhui Tang Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/md/bcache/request.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 3475d6628e21..14c269241b34 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -698,8 +698,16 @@ static void cached_dev_read_error(struct closure *cl) { struct search *s = container_of(cl, struct search, cl); struct bio *bio = &s->bio.bio; + struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); - if (s->recoverable) { + /* + * If cache device is dirty (dc->has_dirty is non-zero), then + * recovery a failed read request from cached device may get a + * stale data back. So read failure recovery is only permitted + * when cache device is clean. + */ + if (s->recoverable && + (dc && !atomic_read(&dc->has_dirty))) { /* Retry from the backing device: */ trace_bcache_read_retry(s->orig_bio); From 1cafc451a955c44ec7eba47bae340d9ca5ccb483 Mon Sep 17 00:00:00 2001 From: Rui Hua Date: Fri, 24 Nov 2017 15:14:26 -0800 Subject: [PATCH 0057/1013] bcache: recover data from backing when data is clean commit e393aa2446150536929140739f09c6ecbcbea7f0 upstream. When we send a read request and hit the clean data in cache device, there is a situation called cache read race in bcache(see the commit in the tail of cache_look_up(), the following explaination just copy from there): The bucket we're reading from might be reused while our bio is in flight, and we could then end up reading the wrong data. We guard against this by checking (in bch_cache_read_endio()) if the pointer is stale again; if so, we treat it as an error (s->iop.error = -EINTR) and reread from the backing device (but we don't pass that error up anywhere) It should be noted that cache read race happened under normal circumstances, not the circumstance when SSD failed, it was counted and shown in /sys/fs/bcache/XXX/internal/cache_read_races. Without this patch, when we use writeback mode, we will never reread from the backing device when cache read race happened, until the whole cache device is clean, because the condition (s->recoverable && (dc && !atomic_read(&dc->has_dirty))) is false in cached_dev_read_error(). In this situation, the s->iop.error(= -EINTR) will be passed up, at last, user will receive -EINTR when it's bio end, this is not suitable, and wield to up-application. In this patch, we use s->read_dirty_data to judge whether the read request hit dirty data in cache device, it is safe to reread data from the backing device when the read request hit clean data. This can not only handle cache read race, but also recover data when failed read request from cache device. [edited by mlyle to fix up whitespace, commit log title, comment spelling] Fixes: d59b23795933 ("bcache: only permit to recovery read error when cache device is clean") Signed-off-by: Hua Rui Reviewed-by: Michael Lyle Reviewed-by: Coly Li Signed-off-by: Michael Lyle Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- drivers/md/bcache/request.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 14c269241b34..14d13cab5cda 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -698,16 +698,15 @@ static void cached_dev_read_error(struct closure *cl) { struct search *s = container_of(cl, struct search, cl); struct bio *bio = &s->bio.bio; - struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); /* - * If cache device is dirty (dc->has_dirty is non-zero), then - * recovery a failed read request from cached device may get a - * stale data back. So read failure recovery is only permitted - * when cache device is clean. + * If read request hit dirty data (s->read_dirty_data is true), + * then recovery a failed read request from cached device may + * get a stale data back. So read failure recovery is only + * permitted when read request hit clean data in cache device, + * or when cache read race happened. */ - if (s->recoverable && - (dc && !atomic_read(&dc->has_dirty))) { + if (s->recoverable && !s->read_dirty_data) { /* Retry from the backing device: */ trace_bcache_read_retry(s->orig_bio); From de120fc9624162a3507091d6b92dd9037440c9d8 Mon Sep 17 00:00:00 2001 From: Peter Rosin Date: Mon, 27 Nov 2017 17:31:00 +0100 Subject: [PATCH 0058/1013] hwmon: (jc42) optionally try to disable the SMBUS timeout commit 68615eb01f82256c19e41967bfb3eef902f77033 upstream. With a nxp,se97 chip on an atmel sama5d31 board, the I2C adapter driver is not always capable of avoiding the 25-35 ms timeout as specified by the SMBUS protocol. This may cause silent corruption of the last bit of any transfer, e.g. a one is read instead of a zero if the sensor chip times out. This also affects the eeprom half of the nxp-se97 chip, where this silent corruption was originally noticed. Other I2C adapters probably suffer similar issues, e.g. bit-banging comes to mind as risky... The SMBUS register in the nxp chip is not a standard Jedec register, but it is not special to the nxp chips either, at least the atmel chips have the same mechanism. Therefore, do not special case this on the manufacturer, it is opt-in via the device property anyway. Signed-off-by: Peter Rosin Acked-by: Rob Herring Signed-off-by: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- .../devicetree/bindings/hwmon/jc42.txt | 4 ++++ drivers/hwmon/jc42.c | 21 +++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/Documentation/devicetree/bindings/hwmon/jc42.txt b/Documentation/devicetree/bindings/hwmon/jc42.txt index 07a250498fbb..f569db58f64a 100644 --- a/Documentation/devicetree/bindings/hwmon/jc42.txt +++ b/Documentation/devicetree/bindings/hwmon/jc42.txt @@ -34,6 +34,10 @@ Required properties: - reg: I2C address +Optional properties: +- smbus-timeout-disable: When set, the smbus timeout function will be disabled. + This is not supported on all chips. + Example: temp-sensor@1a { diff --git a/drivers/hwmon/jc42.c b/drivers/hwmon/jc42.c index 5f11dc014ed6..e5234f953a6d 100644 --- a/drivers/hwmon/jc42.c +++ b/drivers/hwmon/jc42.c @@ -22,6 +22,7 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ +#include #include #include #include @@ -45,6 +46,7 @@ static const unsigned short normal_i2c[] = { #define JC42_REG_TEMP 0x05 #define JC42_REG_MANID 0x06 #define JC42_REG_DEVICEID 0x07 +#define JC42_REG_SMBUS 0x22 /* NXP and Atmel, possibly others? */ /* Status bits in temperature register */ #define JC42_ALARM_CRIT_BIT 15 @@ -75,6 +77,9 @@ static const unsigned short normal_i2c[] = { #define GT_MANID 0x1c68 /* Giantec */ #define GT_MANID2 0x132d /* Giantec, 2nd mfg ID */ +/* SMBUS register */ +#define SMBUS_STMOUT BIT(7) /* SMBus time-out, active low */ + /* Supported chips */ /* Analog Devices */ @@ -495,6 +500,22 @@ static int jc42_probe(struct i2c_client *client, const struct i2c_device_id *id) data->extended = !!(cap & JC42_CAP_RANGE); + if (device_property_read_bool(dev, "smbus-timeout-disable")) { + int smbus; + + /* + * Not all chips support this register, but from a + * quick read of various datasheets no chip appears + * incompatible with the below attempt to disable + * the timeout. And the whole thing is opt-in... + */ + smbus = i2c_smbus_read_word_swapped(client, JC42_REG_SMBUS); + if (smbus < 0) + return smbus; + i2c_smbus_write_word_swapped(client, JC42_REG_SMBUS, + smbus | SMBUS_STMOUT); + } + config = i2c_smbus_read_word_swapped(client, JC42_REG_CONFIG); if (config < 0) return config; From 5e19169a88be15d6c845e93d62e448ee06a14b2f Mon Sep 17 00:00:00 2001 From: Jeff Lien Date: Tue, 21 Nov 2017 10:44:37 -0600 Subject: [PATCH 0059/1013] nvme-pci: add quirk for delay before CHK RDY for WDC SN200 commit 8c97eeccf0ad8783c057830119467b877bdfced7 upstream. And increase the existing delay to cover this device as well. Signed-off-by: Jeff Lien Signed-off-by: Christoph Hellwig Signed-off-by: Greg Kroah-Hartman --- drivers/nvme/host/nvme.h | 2 +- drivers/nvme/host/pci.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index d3f3c4447515..044af553204c 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -108,7 +108,7 @@ static inline struct nvme_request *nvme_req(struct request *req) * NVME_QUIRK_DELAY_BEFORE_CHK_RDY quirk enabled. The value (in ms) was * found empirically. */ -#define NVME_QUIRK_DELAY_AMOUNT 2000 +#define NVME_QUIRK_DELAY_AMOUNT 2300 enum nvme_ctrl_state { NVME_CTRL_NEW, diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 3f5a04c586ce..75539f7c58b9 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -2519,6 +2519,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_IDENTIFY_CNS, }, { PCI_DEVICE(0x1c58, 0x0003), /* HGST adapter */ .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, }, + { PCI_DEVICE(0x1c58, 0x0023), /* WDC SN200 adapter */ + .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, }, { PCI_DEVICE(0x1c5f, 0x0540), /* Memblaze Pblaze4 adapter */ .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY, }, { PCI_DEVICE(0x144d, 0xa821), /* Samsung PM1725 */ From 23506bc7fdb2313c63d8811967fcf82d5764b973 Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Tue, 14 Nov 2017 17:19:29 -0500 Subject: [PATCH 0060/1013] Revert "drm/radeon: dont switch vt on suspend" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 18c437caa5b18a235dd65cec224eab54bebcee65 upstream. Fixes distorted colors on some cards on resume from suspend. This reverts commit b9729b17a414f99c61f4db9ac9f9ed987fa0cbfe. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=98832 Bug: https://bugs.freedesktop.org/show_bug.cgi?id=99163 Bug: https://bugzilla.kernel.org/show_bug.cgi?id=107001 Reviewed-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/radeon/radeon_fb.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_fb.c b/drivers/gpu/drm/radeon/radeon_fb.c index fd25361ac681..4ef967d1a9de 100644 --- a/drivers/gpu/drm/radeon/radeon_fb.c +++ b/drivers/gpu/drm/radeon/radeon_fb.c @@ -245,7 +245,6 @@ static int radeonfb_create(struct drm_fb_helper *helper, } info->par = rfbdev; - info->skip_vt_switch = true; ret = radeon_framebuffer_init(rdev->ddev, &rfbdev->rfb, &mode_cmd, gobj); if (ret) { From 2d8b3a9b7853dbbf12c853674ef4403481ab1513 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Sat, 30 Sep 2017 11:13:28 +0300 Subject: [PATCH 0061/1013] drm/amdgpu: potential uninitialized variable in amdgpu_vce_ring_parse_cs() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 40a9960b046290939b56ce8e51f365258f27f264 upstream. We shifted some code around in commit 9cca0b8e5df0 ("drm/amdgpu: move amdgpu_cs_sysvm_access_required into find_mapping") and now my static checker complains that "r" might not be initialized at the end of the function. I've reviewed the code, and that seems possible, but it's also possible I may have missed something. Reviewed-by: Christian König Signed-off-by: Dan Carpenter Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c index c855366521ab..9fc3d387eae3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c @@ -647,7 +647,7 @@ int amdgpu_vce_ring_parse_cs(struct amdgpu_cs_parser *p, uint32_t ib_idx) uint32_t allocated = 0; uint32_t tmp, handle = 0; uint32_t *size = &tmp; - int i, r, idx = 0; + int i, r = 0, idx = 0; p->job->vm = NULL; ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo); From f182df2eacbe52100b82e193a7917fdbb4312dd1 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Sat, 30 Sep 2017 11:14:13 +0300 Subject: [PATCH 0062/1013] drm/amdgpu: Potential uninitialized variable in amdgpu_vm_update_directories() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 78aa02c713fcf19e9bc8511ab61a5fd6c877cc01 upstream. After commit ea09729c9302 ("drm/amdgpu: rework page directory filling v2") then it becomes a lot harder to verify that "r" is initialized. My static checker complains and so I've reviewed the code. It does look like it might be buggy... Anyway, it doesn't hurt to set "r" to zero at the start. Reviewed-by: Christian König Signed-off-by: Dan Carpenter Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index bd20ff018512..4fb5e14d4879 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1201,7 +1201,7 @@ static void amdgpu_vm_invalidate_level(struct amdgpu_vm_pt *parent) int amdgpu_vm_update_directories(struct amdgpu_device *adev, struct amdgpu_vm *vm) { - int r; + int r = 0; r = amdgpu_vm_update_level(adev, vm, &vm->root, 0); if (r) From 402b1de75ce06404c96cf803e1fe0b5c685f7797 Mon Sep 17 00:00:00 2001 From: Ken Wang Date: Fri, 29 Sep 2017 15:41:43 +0800 Subject: [PATCH 0063/1013] drm/amdgpu: correct reference clock value on vega10 commit 76d6172b6fab16455af4b67bb18a3f66011592f8 upstream. Old value from bringup was wrong. Signed-off-by: Ken Wang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/soc15.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c index f2c3a49f73a0..3e59c766722c 100644 --- a/drivers/gpu/drm/amd/amdgpu/soc15.c +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c @@ -279,10 +279,7 @@ static void soc15_init_golden_registers(struct amdgpu_device *adev) } static u32 soc15_get_xclk(struct amdgpu_device *adev) { - if (adev->asic_type == CHIP_VEGA10) - return adev->clock.spll.reference_freq/4; - else - return adev->clock.spll.reference_freq; + return adev->clock.spll.reference_freq; } From a00b7ea3df52da50097d22c9eb3b0eba3e504838 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Tue, 31 Oct 2017 09:36:13 +0100 Subject: [PATCH 0064/1013] drm/amdgpu: fix error handling in amdgpu_bo_do_create MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit a695e43712242c354748e9bae5d137d4337a7694 upstream. The bo structure is freed up in case of an error, so we can't do any accounting if that happens. Signed-off-by: Christian König Reviewed-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 9e495da0bb03..ffe483980362 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -391,6 +391,9 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev, r = ttm_bo_init_reserved(&adev->mman.bdev, &bo->tbo, size, type, &bo->placement, page_align, !kernel, NULL, acc_size, sg, resv, &amdgpu_ttm_bo_destroy); + if (unlikely(r != 0)) + return r; + bytes_moved = atomic64_read(&adev->num_bytes_moved) - initial_bytes_moved; if (adev->mc.visible_vram_size < adev->mc.real_vram_size && @@ -400,9 +403,6 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev, else amdgpu_cs_report_moved_bytes(adev, bytes_moved, 0); - if (unlikely(r != 0)) - return r; - if (kernel) bo->tbo.priority = 1; From ecd08e684cd1d138bf894e632e0d141f5e7c4140 Mon Sep 17 00:00:00 2001 From: ozeng Date: Tue, 6 Jun 2017 10:53:28 -0500 Subject: [PATCH 0065/1013] drm/amdgpu: Properly allocate VM invalidate eng v2 commit c5066129af4436ab0da8eefe4289774a5409706d upstream. v1: Properly allocate TLB invalidation engine to avoid conflict. v2: Added comments to codes Signed-off-by: Oak Zeng Reviewed-by: Alex Deucher Acked-by: Christian Konig Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index d04d0b123212..6dc0f6e346e7 100644 --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c @@ -395,7 +395,16 @@ static int gmc_v9_0_early_init(void *handle) static int gmc_v9_0_late_init(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; - unsigned vm_inv_eng[AMDGPU_MAX_VMHUBS] = { 3, 3 }; + /* + * The latest engine allocation on gfx9 is: + * Engine 0, 1: idle + * Engine 2, 3: firmware + * Engine 4~13: amdgpu ring, subject to change when ring number changes + * Engine 14~15: idle + * Engine 16: kfd tlb invalidation + * Engine 17: Gart flushes + */ + unsigned vm_inv_eng[AMDGPU_MAX_VMHUBS] = { 4, 4 }; unsigned i; for(i = 0; i < adev->num_rings; ++i) { @@ -408,9 +417,9 @@ static int gmc_v9_0_late_init(void *handle) ring->funcs->vmhub); } - /* Engine 17 is used for GART flushes */ + /* Engine 16 is used for KFD and 17 for GART flushes */ for(i = 0; i < AMDGPU_MAX_VMHUBS; ++i) - BUG_ON(vm_inv_eng[i] > 17); + BUG_ON(vm_inv_eng[i] > 16); return amdgpu_irq_get(adev, &adev->mc.vm_fault, 0); } From d48221f3399764748f706cc1126b5d6465e27d87 Mon Sep 17 00:00:00 2001 From: Ken Wang Date: Wed, 8 Nov 2017 14:48:50 +0800 Subject: [PATCH 0066/1013] drm/amdgpu: Remove check which is not valid for certain VBIOS commit ab6613b7eaefe85dadfc86025e901c55d71c0379 upstream. Fixes vbios fetching on certain headless boards. Signed-off-by: Ken Wang Acked-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c index c21adf60a7f2..057e1ecd83ce 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c @@ -59,12 +59,6 @@ static bool check_atom_bios(uint8_t *bios, size_t size) return false; } - tmp = bios[0x18] | (bios[0x19] << 8); - if (bios[tmp + 0x14] != 0x0) { - DRM_INFO("Not an x86 BIOS ROM\n"); - return false; - } - bios_header_start = bios[0x48] | (bios[0x49] << 8); if (!bios_header_start) { DRM_INFO("Can't locate bios header\n"); From 45d1765032b645a2e86a606f7d173d4bf97a9810 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Mon, 4 Sep 2017 20:58:45 +0200 Subject: [PATCH 0067/1013] drm/ttm: fix ttm_bo_cleanup_refs_or_queue once more MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 378e2d5b504fe0231c557751e58b80fcf717cc20 upstream. With shared reservation objects __ttm_bo_reserve() can easily fail even on destroyed BOs. This prevents correct handling when we need to individualize the reservation object. Fix this by individualizing the object before even trying to reserve it. Signed-off-by: Christian König Acked-by: Chunming Zhou Signed-off-by: Alex Deucher Signed-off-by: Lyude Paul Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/ttm/ttm_bo.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 180ce6296416..bee77d31895b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -440,28 +440,29 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) struct ttm_bo_global *glob = bo->glob; int ret; + ret = ttm_bo_individualize_resv(bo); + if (ret) { + /* Last resort, if we fail to allocate memory for the + * fences block for the BO to become idle + */ + reservation_object_wait_timeout_rcu(bo->resv, true, false, + 30 * HZ); + spin_lock(&glob->lru_lock); + goto error; + } + spin_lock(&glob->lru_lock); ret = __ttm_bo_reserve(bo, false, true, NULL); - if (!ret) { - if (!ttm_bo_wait(bo, false, true)) { + if (reservation_object_test_signaled_rcu(&bo->ttm_resv, true)) { ttm_bo_del_from_lru(bo); spin_unlock(&glob->lru_lock); + if (bo->resv != &bo->ttm_resv) + reservation_object_unlock(&bo->ttm_resv); ttm_bo_cleanup_memtype_use(bo); - return; } - ret = ttm_bo_individualize_resv(bo); - if (ret) { - /* Last resort, if we fail to allocate memory for the - * fences block for the BO to become idle and free it. - */ - spin_unlock(&glob->lru_lock); - ttm_bo_wait(bo, true, true); - ttm_bo_cleanup_memtype_use(bo); - return; - } ttm_bo_flush_all_fences(bo); /* @@ -474,11 +475,12 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) ttm_bo_add_to_lru(bo); } - if (bo->resv != &bo->ttm_resv) - reservation_object_unlock(&bo->ttm_resv); __ttm_bo_unreserve(bo); } + if (bo->resv != &bo->ttm_resv) + reservation_object_unlock(&bo->ttm_resv); +error: kref_get(&bo->list_kref); list_add_tail(&bo->ddestroy, &bdev->ddestroy); spin_unlock(&glob->lru_lock); From 509536b227fb35ed34d144ff4280aa78605ab85e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Mon, 4 Sep 2017 21:02:45 +0200 Subject: [PATCH 0068/1013] dma-buf: make reservation_object_copy_fences rcu save MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 39e16ba16c147e662bf9fbcee9a99d70d420382f upstream. Stop requiring that the src reservation object is locked for this operation. Acked-by: Chunming Zhou Signed-off-by: Christian König Signed-off-by: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/1504551766-5093-1-git-send-email-deathsimple@vodafone.de Signed-off-by: Lyude Paul Signed-off-by: Greg Kroah-Hartman --- drivers/dma-buf/reservation.c | 56 ++++++++++++++++++++++++++--------- 1 file changed, 42 insertions(+), 14 deletions(-) diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index dec3a815455d..b44d9d7db347 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -266,8 +266,7 @@ EXPORT_SYMBOL(reservation_object_add_excl_fence); * @dst: the destination reservation object * @src: the source reservation object * -* Copy all fences from src to dst. Both src->lock as well as dst-lock must be -* held. +* Copy all fences from src to dst. dst-lock must be held. */ int reservation_object_copy_fences(struct reservation_object *dst, struct reservation_object *src) @@ -277,33 +276,62 @@ int reservation_object_copy_fences(struct reservation_object *dst, size_t size; unsigned i; - src_list = reservation_object_get_list(src); + rcu_read_lock(); + src_list = rcu_dereference(src->fence); +retry: if (src_list) { - size = offsetof(typeof(*src_list), - shared[src_list->shared_count]); + unsigned shared_count = src_list->shared_count; + + size = offsetof(typeof(*src_list), shared[shared_count]); + rcu_read_unlock(); + dst_list = kmalloc(size, GFP_KERNEL); if (!dst_list) return -ENOMEM; - dst_list->shared_count = src_list->shared_count; - dst_list->shared_max = src_list->shared_count; - for (i = 0; i < src_list->shared_count; ++i) - dst_list->shared[i] = - dma_fence_get(src_list->shared[i]); + rcu_read_lock(); + src_list = rcu_dereference(src->fence); + if (!src_list || src_list->shared_count > shared_count) { + kfree(dst_list); + goto retry; + } + + dst_list->shared_count = 0; + dst_list->shared_max = shared_count; + for (i = 0; i < src_list->shared_count; ++i) { + struct dma_fence *fence; + + fence = rcu_dereference(src_list->shared[i]); + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, + &fence->flags)) + continue; + + if (!dma_fence_get_rcu(fence)) { + kfree(dst_list); + src_list = rcu_dereference(src->fence); + goto retry; + } + + if (dma_fence_is_signaled(fence)) { + dma_fence_put(fence); + continue; + } + + dst_list->shared[dst_list->shared_count++] = fence; + } } else { dst_list = NULL; } + new = dma_fence_get_rcu_safe(&src->fence_excl); + rcu_read_unlock(); + kfree(dst->staged); dst->staged = NULL; src_list = reservation_object_get_list(dst); - old = reservation_object_get_excl(dst); - new = reservation_object_get_excl(src); - - dma_fence_get(new); preempt_disable(); write_seqcount_begin(&dst->seq); From 99b0ea0391d1b49ed4a944a07ad57ee448b0541c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Fri, 13 Oct 2017 17:24:31 +0200 Subject: [PATCH 0069/1013] drm/amdgpu: reserve root PD while releasing it MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 2642cf110d08a403f585a051e4cbf45a90b3adea upstream. Otherwise somebody could try to evict it at the same time and try to use half torn down structures. Signed-off-by: Christian König Reviewed-and-Tested-by: Michel Dänzer Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Lyude Paul Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 4fb5e14d4879..863c6dd0123a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2586,7 +2586,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm) { struct amdgpu_bo_va_mapping *mapping, *tmp; bool prt_fini_needed = !!adev->gart.gart_funcs->set_prt; - int i; + struct amdgpu_bo *root; + int i, r; amd_sched_entity_fini(vm->entity.sched, &vm->entity); @@ -2609,7 +2610,15 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm) amdgpu_vm_free_mapping(adev, vm, mapping, NULL); } - amdgpu_vm_free_levels(&vm->root); + root = amdgpu_bo_ref(vm->root.bo); + r = amdgpu_bo_reserve(root, true); + if (r) { + dev_err(adev->dev, "Leaking page tables because BO reservation failed\n"); + } else { + amdgpu_vm_free_levels(&vm->root); + amdgpu_bo_unreserve(root); + } + amdgpu_bo_unref(&root); dma_fence_put(vm->last_dir_update); for (i = 0; i < AMDGPU_MAX_VMHUBS; i++) amdgpu_vm_free_reserved_vmid(adev, vm, i); From 6cd3e4a3e697cca84f27a0af68ba59a6ab00892f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Fri, 3 Nov 2017 16:00:35 +0100 Subject: [PATCH 0070/1013] drm/ttm: Always and only destroy bo->ttm_resv in ttm_bo_release_list MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e1fc12c5d9ad06a2a74e97a91f1b0c5f4c723b50 upstream. Fixes a use-after-free due to a race condition in ttm_bo_cleanup_refs_and_unlock, which allows one task to reserve a BO and destroy its ttm_resv while another task is waiting for it to signal in reservation_object_wait_timeout_rcu. v2: * Always initialize bo->ttm_resv in ttm_bo_init_reserved (Christian König) Fixes: 0d2bd2ae045d "drm/ttm: fix memory leak while individualizing BOs" Reviewed-by: Chunming Zhou # v1 Reviewed-by: Christian König Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Lyude Paul Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/ttm/ttm_bo.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index bee77d31895b..c088703777e2 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -150,8 +150,7 @@ static void ttm_bo_release_list(struct kref *list_kref) ttm_tt_destroy(bo->ttm); atomic_dec(&bo->glob->bo_count); dma_fence_put(bo->moving); - if (bo->resv == &bo->ttm_resv) - reservation_object_fini(&bo->ttm_resv); + reservation_object_fini(&bo->ttm_resv); mutex_destroy(&bo->wu_mutex); if (bo->destroy) bo->destroy(bo); @@ -402,14 +401,11 @@ static int ttm_bo_individualize_resv(struct ttm_buffer_object *bo) if (bo->resv == &bo->ttm_resv) return 0; - reservation_object_init(&bo->ttm_resv); BUG_ON(!reservation_object_trylock(&bo->ttm_resv)); r = reservation_object_copy_fences(&bo->ttm_resv, bo->resv); - if (r) { + if (r) reservation_object_unlock(&bo->ttm_resv); - reservation_object_fini(&bo->ttm_resv); - } return r; } @@ -459,6 +455,7 @@ static void ttm_bo_cleanup_refs_or_queue(struct ttm_buffer_object *bo) spin_unlock(&glob->lru_lock); if (bo->resv != &bo->ttm_resv) reservation_object_unlock(&bo->ttm_resv); + ttm_bo_cleanup_memtype_use(bo); return; } @@ -1205,8 +1202,8 @@ int ttm_bo_init_reserved(struct ttm_bo_device *bdev, lockdep_assert_held(&bo->resv->lock.base); } else { bo->resv = &bo->ttm_resv; - reservation_object_init(&bo->ttm_resv); } + reservation_object_init(&bo->ttm_resv); atomic_inc(&bo->glob->bo_count); drm_vma_node_reset(&bo->vma_node); bo->priority = 0; From 7c33c32d39b9d11338624d4a7bad4e75bf64fbf8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Tue, 10 Oct 2017 16:33:22 +0300 Subject: [PATCH 0071/1013] drm/vblank: Fix flip event vblank count MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 632c6e4edef17c40bba3be67c980d959790d142f upstream. On machines where the vblank interrupt fires some time after the start of vblank (or we just manage to race with the vblank interrupt handler) we will currently stuff a stale vblank counter value into the flip event, and thus we'll prematurely complete the flip. Switch over to drm_crtc_accurate_vblank_count() to make sure we have an up to date counter value, crucially also remember to add the +1 so that the delayed vblank interrupt won't complete the flip prematurely. Cc: Daniel Vetter Suggested-by: Daniel Vetter Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20171010133322.24029-1-ville.syrjala@linux.intel.com Reviewed-by: Daniel Vetter #irc Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/drm_vblank.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index 70f2b9593edc..f14e6c92e332 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -869,7 +869,7 @@ void drm_crtc_arm_vblank_event(struct drm_crtc *crtc, assert_spin_locked(&dev->event_lock); e->pipe = pipe; - e->event.sequence = drm_vblank_count(dev, pipe); + e->event.sequence = drm_crtc_accurate_vblank_count(crtc) + 1; e->event.crtc_id = crtc->base.id; list_add_tail(&e->base.link, &dev->vblank_event_list); } From a5851fdb17dda5f0854417f15a0c9c833a92dcc6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Mon, 23 Oct 2017 18:25:40 +0300 Subject: [PATCH 0072/1013] drm/vblank: Tune drm_crtc_accurate_vblank_count() WARN down to a debug MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit a111fbc4c44d2981f1a8fef64418685be5e30280 upstream. Since commit 632c6e4edef1 ("drm/vblank: Fix flip event vblank count") even drivers that don't implement accurate vblank timestamps will end up using drm_crtc_accurate_vblank_count(). That leads to a WARN every time drm_crtc_arm_vblank_event() gets called. The could be as often as every frame for each active crtc. Considering drm_crtc_accurate_vblank_count() is never any worse than the drm_vblank_count() we used previously, let's just skip the WARN unless DRM_UT_VBL is enabled. That way people won't be bothered by this unless they're debugging vblank code. And let's also change it to WARN_ONCE() so that even when you're debugging vblank code you won't get drowned by constant WARNs. Cc: Daniel Vetter Cc: "Szyprowski, Marek" Cc: Andrzej Hajda Reported-by: Andrzej Hajda Fixes: 632c6e4edef1 ("drm/vblank: Fix flip event vblank count") Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20171023152540.15364-1-ville.syrjala@linux.intel.com Acked-by: Benjamin Gaignard Reviewed-by: Daniel Vetter Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/drm_vblank.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index f14e6c92e332..17e8ef9a1c11 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -311,8 +311,8 @@ u32 drm_crtc_accurate_vblank_count(struct drm_crtc *crtc) u32 vblank; unsigned long flags; - WARN(!dev->driver->get_vblank_timestamp, - "This function requires support for accurate vblank timestamps."); + WARN_ONCE(drm_debug & DRM_UT_VBL && !dev->driver->get_vblank_timestamp, + "This function requires support for accurate vblank timestamps."); spin_lock_irqsave(&dev->vblank_time_lock, flags); From e3c5870862d95ff198a1902c3d7bd2a548b89ea7 Mon Sep 17 00:00:00 2001 From: Jyri Sarha Date: Thu, 12 Oct 2017 12:19:46 +0300 Subject: [PATCH 0073/1013] drm/tilcdc: Precalculate total frametime in tilcdc_crtc_set_mode() commit ce99f7206c9105851d97202ed08c058af6f11ac4 upstream. We need the total frame refresh time to check if we are too close to vertical sync when updating the two framebuffer DMA registers and risk a collision. This new method is more accurate that the previous that based on mode's vrefresh value, which itself is inaccurate or may not even be initialized. Reported-by: Kevin Hao Fixes: 11abbc9f39e0 ("drm/tilcdc: Set framebuffer DMA address to HW only if CRTC is enabled") Signed-off-by: Jyri Sarha Reviewed-by: Tomi Valkeinen Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/tilcdc/tilcdc_crtc.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c index 406fe4544b83..06d6e785c920 100644 --- a/drivers/gpu/drm/tilcdc/tilcdc_crtc.c +++ b/drivers/gpu/drm/tilcdc/tilcdc_crtc.c @@ -24,6 +24,7 @@ #include #include #include +#include #include "tilcdc_drv.h" #include "tilcdc_regs.h" @@ -48,6 +49,7 @@ struct tilcdc_crtc { unsigned int lcd_fck_rate; ktime_t last_vblank; + unsigned int hvtotal_us; struct drm_framebuffer *curr_fb; struct drm_framebuffer *next_fb; @@ -292,6 +294,12 @@ static void tilcdc_crtc_set_clk(struct drm_crtc *crtc) LCDC_V2_CORE_CLK_EN); } +uint tilcdc_mode_hvtotal(const struct drm_display_mode *mode) +{ + return (uint) div_u64(1000llu * mode->htotal * mode->vtotal, + mode->clock); +} + static void tilcdc_crtc_set_mode(struct drm_crtc *crtc) { struct tilcdc_crtc *tilcdc_crtc = to_tilcdc_crtc(crtc); @@ -459,6 +467,9 @@ static void tilcdc_crtc_set_mode(struct drm_crtc *crtc) drm_framebuffer_reference(fb); crtc->hwmode = crtc->state->adjusted_mode; + + tilcdc_crtc->hvtotal_us = + tilcdc_mode_hvtotal(&crtc->hwmode); } static void tilcdc_crtc_enable(struct drm_crtc *crtc) @@ -648,7 +659,7 @@ int tilcdc_crtc_update_fb(struct drm_crtc *crtc, spin_lock_irqsave(&tilcdc_crtc->irq_lock, flags); next_vblank = ktime_add_us(tilcdc_crtc->last_vblank, - 1000000 / crtc->hwmode.vrefresh); + tilcdc_crtc->hvtotal_us); tdiff = ktime_to_us(ktime_sub(next_vblank, ktime_get())); if (tdiff < TILCDC_VBLANK_SAFETY_THRESHOLD_US) From 8153a0fc1212157a4bf8b303cbf51993ff87aa7c Mon Sep 17 00:00:00 2001 From: Roman Kapl Date: Mon, 30 Oct 2017 11:56:13 +0100 Subject: [PATCH 0074/1013] drm/radeon: fix atombios on big endian commit 4f626a4ac8f57ddabf06d03870adab91e463217f upstream. The function for byteswapping the data send to/from atombios was buggy for num_bytes not divisible by four. The function must be aware of the fact that after byte-swapping the u32 units, valid bytes might end up after the num_bytes boundary. This patch was tested on kernel 3.12 and allowed us to sucesfully use DisplayPort on and Radeon SI card. Namely it fixed the link training and EDID readout. The function is patched both in radeon and amd drivers, since the functions and the fixes are identical. Signed-off-by: Roman Kapl Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 38 ++++++++++---------- drivers/gpu/drm/radeon/atombios_dp.c | 38 ++++++++++---------- 2 files changed, 36 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c index ce443586a0c7..cc4e18dcd8b6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c @@ -1766,34 +1766,32 @@ bool amdgpu_atombios_scratch_need_asic_init(struct amdgpu_device *adev) return true; } -/* Atom needs data in little endian format - * so swap as appropriate when copying data to - * or from atom. Note that atom operates on - * dw units. +/* Atom needs data in little endian format so swap as appropriate when copying + * data to or from atom. Note that atom operates on dw units. + * + * Use to_le=true when sending data to atom and provide at least + * ALIGN(num_bytes,4) bytes in the dst buffer. + * + * Use to_le=false when receiving data from atom and provide ALIGN(num_bytes,4) + * byes in the src buffer. */ void amdgpu_atombios_copy_swap(u8 *dst, u8 *src, u8 num_bytes, bool to_le) { #ifdef __BIG_ENDIAN - u8 src_tmp[20], dst_tmp[20]; /* used for byteswapping */ - u32 *dst32, *src32; + u32 src_tmp[5], dst_tmp[5]; int i; + u8 align_num_bytes = ALIGN(num_bytes, 4); - memcpy(src_tmp, src, num_bytes); - src32 = (u32 *)src_tmp; - dst32 = (u32 *)dst_tmp; if (to_le) { - for (i = 0; i < ((num_bytes + 3) / 4); i++) - dst32[i] = cpu_to_le32(src32[i]); - memcpy(dst, dst_tmp, num_bytes); + memcpy(src_tmp, src, num_bytes); + for (i = 0; i < align_num_bytes / 4; i++) + dst_tmp[i] = cpu_to_le32(src_tmp[i]); + memcpy(dst, dst_tmp, align_num_bytes); } else { - u8 dws = num_bytes & ~3; - for (i = 0; i < ((num_bytes + 3) / 4); i++) - dst32[i] = le32_to_cpu(src32[i]); - memcpy(dst, dst_tmp, dws); - if (num_bytes % 4) { - for (i = 0; i < (num_bytes % 4); i++) - dst[dws+i] = dst_tmp[dws+i]; - } + memcpy(src_tmp, src, align_num_bytes); + for (i = 0; i < align_num_bytes / 4; i++) + dst_tmp[i] = le32_to_cpu(src_tmp[i]); + memcpy(dst, dst_tmp, num_bytes); } #else memcpy(dst, src, num_bytes); diff --git a/drivers/gpu/drm/radeon/atombios_dp.c b/drivers/gpu/drm/radeon/atombios_dp.c index 432cb46f6a34..fd7682bf335d 100644 --- a/drivers/gpu/drm/radeon/atombios_dp.c +++ b/drivers/gpu/drm/radeon/atombios_dp.c @@ -45,34 +45,32 @@ static char *pre_emph_names[] = { /***** radeon AUX functions *****/ -/* Atom needs data in little endian format - * so swap as appropriate when copying data to - * or from atom. Note that atom operates on - * dw units. +/* Atom needs data in little endian format so swap as appropriate when copying + * data to or from atom. Note that atom operates on dw units. + * + * Use to_le=true when sending data to atom and provide at least + * ALIGN(num_bytes,4) bytes in the dst buffer. + * + * Use to_le=false when receiving data from atom and provide ALIGN(num_bytes,4) + * byes in the src buffer. */ void radeon_atom_copy_swap(u8 *dst, u8 *src, u8 num_bytes, bool to_le) { #ifdef __BIG_ENDIAN - u8 src_tmp[20], dst_tmp[20]; /* used for byteswapping */ - u32 *dst32, *src32; + u32 src_tmp[5], dst_tmp[5]; int i; + u8 align_num_bytes = ALIGN(num_bytes, 4); - memcpy(src_tmp, src, num_bytes); - src32 = (u32 *)src_tmp; - dst32 = (u32 *)dst_tmp; if (to_le) { - for (i = 0; i < ((num_bytes + 3) / 4); i++) - dst32[i] = cpu_to_le32(src32[i]); - memcpy(dst, dst_tmp, num_bytes); + memcpy(src_tmp, src, num_bytes); + for (i = 0; i < align_num_bytes / 4; i++) + dst_tmp[i] = cpu_to_le32(src_tmp[i]); + memcpy(dst, dst_tmp, align_num_bytes); } else { - u8 dws = num_bytes & ~3; - for (i = 0; i < ((num_bytes + 3) / 4); i++) - dst32[i] = le32_to_cpu(src32[i]); - memcpy(dst, dst_tmp, dws); - if (num_bytes % 4) { - for (i = 0; i < (num_bytes % 4); i++) - dst[dws+i] = dst_tmp[dws+i]; - } + memcpy(src_tmp, src, align_num_bytes); + for (i = 0; i < align_num_bytes / 4; i++) + dst_tmp[i] = le32_to_cpu(src_tmp[i]); + memcpy(dst, dst_tmp, num_bytes); } #else memcpy(dst, src, num_bytes); From 87139913c0e2d0a7270868a7a36d549dfd3fad5d Mon Sep 17 00:00:00 2001 From: Jonathan Liu Date: Mon, 7 Aug 2017 21:55:45 +1000 Subject: [PATCH 0075/1013] drm/panel: simple: Add missing panel_simple_unprepare() calls commit f3621a8eb59a913612c8e6e37d81f16b649f8b6c upstream. During panel removal or system shutdown panel_simple_disable() is called which disables the panel backlight but the panel is still powered due to missing calls to panel_simple_unprepare(). Fixes: d02fd93e2cd8 ("drm/panel: simple - Disable panel on shutdown") Signed-off-by: Jonathan Liu Signed-off-by: Thierry Reding Link: https://patchwork.freedesktop.org/patch/msgid/20170807115545.27747-1-net147@gmail.com Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/panel/panel-simple.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/panel/panel-simple.c b/drivers/gpu/drm/panel/panel-simple.c index 474fa759e06e..234af81fb3d0 100644 --- a/drivers/gpu/drm/panel/panel-simple.c +++ b/drivers/gpu/drm/panel/panel-simple.c @@ -369,6 +369,7 @@ static int panel_simple_remove(struct device *dev) drm_panel_remove(&panel->base); panel_simple_disable(&panel->base); + panel_simple_unprepare(&panel->base); if (panel->ddc) put_device(&panel->ddc->dev); @@ -384,6 +385,7 @@ static void panel_simple_shutdown(struct device *dev) struct panel_simple *panel = dev_get_drvdata(dev); panel_simple_disable(&panel->base); + panel_simple_unprepare(&panel->base); } static const struct drm_display_mode ampire_am_480272h3tmqw_t01h_mode = { From f5e69000aa80880491ca4a613679bd21c60b048b Mon Sep 17 00:00:00 2001 From: Peter Griffin Date: Tue, 15 Aug 2017 15:14:25 +0100 Subject: [PATCH 0076/1013] drm/hisilicon: Ensure LDI regs are properly configured. commit a2f042430784d86eb2b7a6d2a869f552da30edba upstream. This patch fixes the following soft lockup: BUG: soft lockup - CPU#0 stuck for 23s! [weston:307] On weston idle-timeout the IP is powered down and reset asserted. On weston resume we get a massive vblank IRQ storm due to the LDI registers having lost some state. This state loss is caused by ade_crtc_atomic_begin() not calling ade_ldi_set_mode(). With this patch applied resuming from Weston idle-timeout works well. Signed-off-by: Peter Griffin Tested-by: John Stultz Reviewed-by: Xinliang Liu Signed-off-by: Xinliang Liu Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c index 9823477b1855..2269be91f3e1 100644 --- a/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c +++ b/drivers/gpu/drm/hisilicon/kirin/kirin_drm_ade.c @@ -534,9 +534,12 @@ static void ade_crtc_atomic_begin(struct drm_crtc *crtc, { struct ade_crtc *acrtc = to_ade_crtc(crtc); struct ade_hw_ctx *ctx = acrtc->ctx; + struct drm_display_mode *mode = &crtc->state->mode; + struct drm_display_mode *adj_mode = &crtc->state->adjusted_mode; if (!ctx->power_on) (void)ade_power_up(ctx); + ade_ldi_set_mode(acrtc, mode, adj_mode); } static void ade_crtc_atomic_flush(struct drm_crtc *crtc, From 78f4bb6e965ddd9f60f607dfaf6e20109da84f6d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Mon, 30 Oct 2017 14:57:43 +0100 Subject: [PATCH 0077/1013] drm/ttm: once more fix ttm_buffer_object_transfer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 4d98e5ee6084f6d7bc578c5d5f86de7156aaa4cb upstream. When the mutex is locked just in the moment we copy it we end up with a warning that we release a locked mutex. Fix this by properly reinitializing the mutex. Signed-off-by: Christian König Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/ttm/ttm_bo_util.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index c934ad5b3903..7c2fbdbbd048 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -474,6 +474,7 @@ static int ttm_buffer_object_transfer(struct ttm_buffer_object *bo, INIT_LIST_HEAD(&fbo->lru); INIT_LIST_HEAD(&fbo->swap); INIT_LIST_HEAD(&fbo->io_reserve_lru); + mutex_init(&fbo->wu_mutex); fbo->moving = NULL; drm_vma_node_reset(&fbo->vma_node); atomic_set(&fbo->cpu_writers, 0); From e4c84b504e02f6f7103cdc67b1adbe304910c104 Mon Sep 17 00:00:00 2001 From: Rex Zhu Date: Fri, 17 Nov 2017 16:41:16 +0800 Subject: [PATCH 0078/1013] drm/amd/pp: fix typecast error in powerplay. commit 8d8258bdab735d9f3c4b78e091ecfbb2b2b1f2ca upstream. resulted in unexpected data truncation Reviewed-by: Alex Deucher Signed-off-by: Rex Zhu Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c index 84f01fd33aff..b50aa292d026 100644 --- a/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/process_pptables_v1_0.c @@ -850,9 +850,9 @@ static int init_over_drive_limits( const ATOM_Tonga_POWERPLAYTABLE *powerplay_table) { hwmgr->platform_descriptor.overdriveLimit.engineClock = - le16_to_cpu(powerplay_table->ulMaxODEngineClock); + le32_to_cpu(powerplay_table->ulMaxODEngineClock); hwmgr->platform_descriptor.overdriveLimit.memoryClock = - le16_to_cpu(powerplay_table->ulMaxODMemoryClock); + le32_to_cpu(powerplay_table->ulMaxODMemoryClock); hwmgr->platform_descriptor.minOverdriveVDDC = 0; hwmgr->platform_descriptor.maxOverdriveVDDC = 0; From dde499d3cf6edea946ff59cb5d72e4f9ef21f3fd Mon Sep 17 00:00:00 2001 From: Maarten Lankhorst Date: Tue, 28 Nov 2017 12:16:03 +0100 Subject: [PATCH 0079/1013] drm/fb_helper: Disable all crtc's when initial setup fails. commit 52dd06506e9bbc2a37b352df7dfc5468f8da21fd upstream. Some drivers like i915 start with crtc's enabled, but with deferred fbcon setup they were no longer disabled as part of fbdev setup. Headless units could no longer enter pc3 state because the crtc was still enabled. Fix this by calling restore_fbdev_mode when we would have called it otherwise once during initial fbdev setup. Signed-off-by: Maarten Lankhorst Fixes: ca91a2758fce ("drm/fb-helper: Support deferred setup") Reported-by: Thomas Voegtle Tested-by: Thomas Voegtle Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20171128111603.62757-1-maarten.lankhorst@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/drm_fb_helper.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index 1b8f013ffa65..5e93589c335c 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -1809,6 +1809,10 @@ static int drm_fb_helper_single_fb_probe(struct drm_fb_helper *fb_helper, if (crtc_count == 0 || sizes.fb_width == -1 || sizes.fb_height == -1) { DRM_INFO("Cannot find any crtc or sizes\n"); + + /* First time: disable all crtc's.. */ + if (!fb_helper->deferred_setup && !READ_ONCE(fb_helper->dev->master)) + restore_fbdev_mode(fb_helper); return -EAGAIN; } From e685410c5f30a19c54d652e033942ed270d31b61 Mon Sep 17 00:00:00 2001 From: Laurent Pinchart Date: Fri, 10 Nov 2017 17:38:34 +0100 Subject: [PATCH 0080/1013] drm/fsl-dcu: Don't set connector DPMS property commit daee54263c1202cbdab85c5e15ae30b417602efb upstream. Since commit 4a97a3da420b ("drm: Don't update property values for atomic drivers") atomic drivers must not update property values as properties are read from the state instead. To catch remaining users, the drm_object_property_set_value() function now throws a warning when called by atomic drivers on non-immutable properties, and we hit that warning when creating connectors. The easy fix is to just remove the drm_object_property_set_value() as it is used here to set the initial value of the connector's DPMS property to OFF. The DPMS property applies on top of the connector's state crtc pointer (initialized to NULL) that is the main connector on/off control, and should thus default to ON. Fixes: 4a97a3da420b ("drm: Don't update property values for atomic drivers") Signed-off-by: Laurent Pinchart Signed-off-by: Stefan Agner Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_rgb.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_rgb.c b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_rgb.c index edd7d8127d19..c54806d08dd7 100644 --- a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_rgb.c +++ b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_rgb.c @@ -102,7 +102,6 @@ static int fsl_dcu_attach_panel(struct fsl_dcu_drm_device *fsl_dev, { struct drm_encoder *encoder = &fsl_dev->encoder; struct drm_connector *connector = &fsl_dev->connector.base; - struct drm_mode_config *mode_config = &fsl_dev->drm->mode_config; int ret; fsl_dev->connector.encoder = encoder; @@ -122,10 +121,6 @@ static int fsl_dcu_attach_panel(struct fsl_dcu_drm_device *fsl_dev, if (ret < 0) goto err_sysfs; - drm_object_property_set_value(&connector->base, - mode_config->dpms_property, - DRM_MODE_DPMS_OFF); - ret = drm_panel_attach(panel, connector); if (ret) { dev_err(fsl_dev->dev, "failed to attach panel\n"); From d8f74d70634aa0dfe063d0e445f683b0545fa5b8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Wed, 8 Nov 2017 17:25:04 +0200 Subject: [PATCH 0081/1013] drm/edid: Don't send non-zero YQ in AVI infoframe for HDMI 1.x sinks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 9271c0ca573e02a360b636ecd8cb408852f4e9f6 upstream. Apparently some sinks look at the YQ bits even when receiving RGB, and they get somehow confused when they see a non-zero YQ value. So we can't just blindly follow CEA-861-F and set YQ to match the RGB range. Unfortunately there is no good way to tell whether the sink designer claims to have read CEA-861-F. The CEA extension block revision number has generally been stuck at 3 since forever, and even a very recently manufactured sink might be based on an old design so the manufacturing date doesn't seem like something we can use. In lieu of better information let's follow CEA-861-F only for HDMI 2.0 sinks, since HDMI 2.0 is based on CEA-861-F. For HDMI 1.x sinks we'll always set YQ=0. The alternative would of course be to always set YQ=0. And if we ever encounter a HDMI 2.0+ sink with this bug that's what we'll probably have to do. Cc: Jani Nikula Cc: Eric Anholt Cc: Neil Kownacki Reported-by: Neil Kownacki Tested-by: Neil Kownacki Fixes: fcc8a22cc905 ("drm/edid: Set YQ bits in the AVI infoframe according to CEA-861-F") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101639 Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20171108152504.12596-1-ville.syrjala@linux.intel.com Acked-by: Eric Anholt Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/drm_edid.c | 12 ++++++++++-- drivers/gpu/drm/i915/intel_hdmi.c | 3 ++- drivers/gpu/drm/vc4/vc4_hdmi.c | 3 ++- include/drm/drm_edid.h | 3 ++- 4 files changed, 16 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index 6bb6337be920..fc7946eb6665 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -4809,7 +4809,8 @@ void drm_hdmi_avi_infoframe_quant_range(struct hdmi_avi_infoframe *frame, const struct drm_display_mode *mode, enum hdmi_quantization_range rgb_quant_range, - bool rgb_quant_range_selectable) + bool rgb_quant_range_selectable, + bool is_hdmi2_sink) { /* * CEA-861: @@ -4833,8 +4834,15 @@ drm_hdmi_avi_infoframe_quant_range(struct hdmi_avi_infoframe *frame, * YQ-field to match the RGB Quantization Range being transmitted * (e.g., when Limited Range RGB, set YQ=0 or when Full Range RGB, * set YQ=1) and the Sink shall ignore the YQ-field." + * + * Unfortunate certain sinks (eg. VIZ Model 67/E261VA) get confused + * by non-zero YQ when receiving RGB. There doesn't seem to be any + * good way to tell which version of CEA-861 the sink supports, so + * we limit non-zero YQ to HDMI 2.0 sinks only as HDMI 2.0 is based + * on on CEA-861-F. */ - if (rgb_quant_range == HDMI_QUANTIZATION_RANGE_LIMITED) + if (!is_hdmi2_sink || + rgb_quant_range == HDMI_QUANTIZATION_RANGE_LIMITED) frame->ycc_quantization_range = HDMI_YCC_QUANTIZATION_RANGE_LIMITED; else diff --git a/drivers/gpu/drm/i915/intel_hdmi.c b/drivers/gpu/drm/i915/intel_hdmi.c index e8abea7594ec..3fed1d3ecded 100644 --- a/drivers/gpu/drm/i915/intel_hdmi.c +++ b/drivers/gpu/drm/i915/intel_hdmi.c @@ -481,7 +481,8 @@ static void intel_hdmi_set_avi_infoframe(struct drm_encoder *encoder, crtc_state->limited_color_range ? HDMI_QUANTIZATION_RANGE_LIMITED : HDMI_QUANTIZATION_RANGE_FULL, - intel_hdmi->rgb_quant_range_selectable); + intel_hdmi->rgb_quant_range_selectable, + is_hdmi2_sink); /* TODO: handle pixel repetition for YCBCR420 outputs */ intel_write_infoframe(encoder, crtc_state, &frame); diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c index 937da8dd65b8..8f71157a2b06 100644 --- a/drivers/gpu/drm/vc4/vc4_hdmi.c +++ b/drivers/gpu/drm/vc4/vc4_hdmi.c @@ -433,7 +433,8 @@ static void vc4_hdmi_set_avi_infoframe(struct drm_encoder *encoder) vc4_encoder->limited_rgb_range ? HDMI_QUANTIZATION_RANGE_LIMITED : HDMI_QUANTIZATION_RANGE_FULL, - vc4_encoder->rgb_range_selectable); + vc4_encoder->rgb_range_selectable, + false); vc4_hdmi_write_infoframe(encoder, &frame); } diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h index 1e1908a6b1d6..a992434ded99 100644 --- a/include/drm/drm_edid.h +++ b/include/drm/drm_edid.h @@ -360,7 +360,8 @@ void drm_hdmi_avi_infoframe_quant_range(struct hdmi_avi_infoframe *frame, const struct drm_display_mode *mode, enum hdmi_quantization_range rgb_quant_range, - bool rgb_quant_range_selectable); + bool rgb_quant_range_selectable, + bool is_hdmi2_sink); /** * drm_eld_mnl - Get ELD monitor name length in bytes. From 802328da073a0ba36a1cde899c48a5f07732d4aa Mon Sep 17 00:00:00 2001 From: Leo Liu Date: Tue, 21 Nov 2017 09:08:07 -0500 Subject: [PATCH 0082/1013] drm/amdgpu: move UVD/VCE and VCN structure out from union MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit b43aaee69d4327d05e7624d9471c17d015b4d67d upstream. With the enablement of VCN Dec and Enc from user space, User space queries kernel for the IP information, if HW has UVD/VCE, the info comes from these IP blocks, but this could end up mis-interpret for VCN when they are in the union, the other way same when HW with VCN block. Signed-off-by: Leo Liu Acked-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Alex Deucher Fixes: 95d0906f8506 ("drm/amdgpu: add initial vcn support and decode tests") Reviewed-and-Tested-by: Michel Dänzer Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 103635ab784c..87801faaf264 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1536,18 +1536,14 @@ struct amdgpu_device { /* sdma */ struct amdgpu_sdma sdma; - union { - struct { - /* uvd */ - struct amdgpu_uvd uvd; - - /* vce */ - struct amdgpu_vce vce; - }; - - /* vcn */ - struct amdgpu_vcn vcn; - }; + /* uvd */ + struct amdgpu_uvd uvd; + + /* vce */ + struct amdgpu_vce vce; + + /* vcn */ + struct amdgpu_vcn vcn; /* firmwares */ struct amdgpu_firmware firmware; From 248b0ec5ad2501a35f2603384f1c7c7cee79eb02 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Wed, 22 Nov 2017 15:55:21 +0100 Subject: [PATCH 0083/1013] drm/amdgpu: Set adev->vcn.irq.num_types for VCN MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 89ce6e0afee8eafa679093207dabd717af9d09c5 upstream. We were setting adev->uvd.irq.num_types instead. Fixes: 9b257116e784 ("drm/amdgpu: add vcn enc irq support") Reviewed-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c index 21e7b88401e1..a098712bdd2f 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -1175,7 +1175,7 @@ static const struct amdgpu_irq_src_funcs vcn_v1_0_irq_funcs = { static void vcn_v1_0_set_irq_funcs(struct amdgpu_device *adev) { - adev->uvd.irq.num_types = adev->vcn.num_enc_rings + 1; + adev->vcn.irq.num_types = adev->vcn.num_enc_rings + 1; adev->vcn.irq.funcs = &vcn_v1_0_irq_funcs; } From 75e63076249aedd3a7a3c38677d135cc146f1f4b Mon Sep 17 00:00:00 2001 From: Sandipan Das Date: Fri, 17 Nov 2017 15:27:28 -0800 Subject: [PATCH 0084/1013] include/linux/compiler-clang.h: handle randomizable anonymous structs commit 4ca59b14e588f873795a11cdc77a25c686a29d23 upstream. The GCC randomize layout plugin can randomize the member offsets of sensitive kernel data structures. To use this feature, certain annotations and members are added to the structures which affect the member offsets even if this plugin is not used. All of these structures are completely randomized, except for task_struct which leaves out some of its members. All the other members are wrapped within an anonymous struct with the __randomize_layout attribute. This is done using the randomized_struct_fields_start and randomized_struct_fields_end defines. When the plugin is disabled, the behaviour of this attribute can vary based on the GCC version. For GCC 5.1+, this attribute maps to __designated_init otherwise it is just an empty define but the anonymous structure is still present. For other compilers, both randomized_struct_fields_start and randomized_struct_fields_end default to empty defines meaning the anonymous structure is not introduced at all. So, if a module compiled with Clang, such as a BPF program, needs to access task_struct fields such as pid and comm, the offsets of these members as recognized by Clang are different from those recognized by modules compiled with GCC. If GCC 4.6+ is used to build the kernel, this can be solved by introducing appropriate defines for Clang so that the anonymous structure is seen when determining the offsets for the members. Link: http://lkml.kernel.org/r/20171109064645.25581-1-sandipan@linux.vnet.ibm.com Signed-off-by: Sandipan Das Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Kate Stewart Cc: Thomas Gleixner Cc: Naveen N. Rao Cc: Alexei Starovoitov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler-clang.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 54dfef70a072..780b1242bf24 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -16,3 +16,6 @@ * with any version that can compile the kernel */ #define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __COUNTER__) + +#define randomized_struct_fields_start struct { +#define randomized_struct_fields_end }; From 2b24fdca153f0fd6c2d0f1f60c7606a53b6db575 Mon Sep 17 00:00:00 2001 From: Don Hiatt Date: Mon, 2 Oct 2017 11:04:48 -0700 Subject: [PATCH 0085/1013] IB/core: Do not warn on lid conversions for OPA commit 6588e412fe872ed81f3fb8d9b4561a66ecb763d0 upstream. On OPA devices the user_mad recv_handler can receive 32Bit LIDs (e.g. OPA_PERMISSIVE_LID) and it is okay to lose the upper 16 bits of the LID as this information is obtained elsewhere. Do not issue a warning when calling ib_lid_be16() in this case by masking out the upper 16Bits. [75667.310846] ------------[ cut here ]------------ [75667.316447] WARNING: CPU: 0 PID: 1718 at ./include/rdma/ib_verbs.h:3799 recv_handler+0x15a/0x170 [ib_umad] [75667.327640] Modules linked in: ib_ipoib hfi1(E) rdmavt(E) rdma_ucm(E) ib_ucm(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_umad(E) ib_uverbs(E) ib_core(E) libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod dax x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel mei_me ipmi_si iTCO_wdt iTCO_vendor_support crypto_simd ipmi_devintf pcspkr mei sg i2c_i801 glue_helper lpc_ich shpchp ioatdma mfd_core wmi ipmi_msghandler cryptd acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ptp ahci libahci pps_core crc32c_intel libata dca i2c_algo_bit i2c_core [last unloaded: ib_core] [75667.407704] CPU: 0 PID: 1718 Comm: kworker/0:1H Tainted: G W I E 4.13.0-rc7+ #1 [75667.417310] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [75667.429555] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [75667.436360] task: ffff88084a718000 task.stack: ffffc9000a424000 [75667.443549] RIP: 0010:recv_handler+0x15a/0x170 [ib_umad] [75667.450090] RSP: 0018:ffffc9000a427ce8 EFLAGS: 00010286 [75667.456508] RAX: 00000000ffffffff RBX: ffff88085159ce80 RCX: 0000000000000000 [75667.465094] RDX: ffff88085a47b068 RSI: 0000000000000000 RDI: ffff88085159cf00 [75667.473668] RBP: ffffc9000a427d38 R08: 000000000001efc0 R09: ffff88085159ce80 [75667.482228] R10: ffff88085f007480 R11: ffff88084acf20e8 R12: ffff88085a47b020 [75667.490824] R13: ffff881056842e10 R14: ffff881056840200 R15: ffff88104c8d0800 [75667.499390] FS: 0000000000000000(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000 [75667.509028] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [75667.516080] CR2: 00007f9e4b3d9000 CR3: 0000000001c09000 CR4: 00000000001406f0 [75667.524664] Call Trace: [75667.528044] ? find_mad_agent+0x7c/0x1b0 [ib_core] [75667.534031] ? ib_mark_mad_done+0x73/0xa0 [ib_core] [75667.540142] ib_mad_recv_done+0x423/0x9b0 [ib_core] [75667.546215] __ib_process_cq+0x5d/0xb0 [ib_core] [75667.552007] ib_cq_poll_work+0x20/0x60 [ib_core] [75667.557766] process_one_work+0x149/0x360 [75667.562844] worker_thread+0x4d/0x3c0 [75667.567529] kthread+0x109/0x140 [75667.571713] ? rescuer_thread+0x380/0x380 [75667.576775] ? kthread_park+0x60/0x60 [75667.581447] ret_from_fork+0x25/0x30 [75667.586014] Code: 43 4a 0f b6 45 c6 88 43 4b 48 8b 45 b0 48 89 43 4c 48 8b 45 b8 48 89 43 54 8b 45 c0 0f c8 89 43 5c e9 79 ff ff ff e8 16 4e fa e0 <0f> ff e9 42 ff ff ff 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 [75667.608323] ---[ end trace cf26df27c9597264 ]--- Fixes: 62ede7779904 ("Add OPA extended LID support") Reviewed-by: Mike Marciniszyn Signed-off-by: Don Hiatt Signed-off-by: Dennis Dalessandro Reviewed-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/user_mad.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index c1696e6084b2..603acaf91828 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -229,7 +229,16 @@ static void recv_handler(struct ib_mad_agent *agent, packet->mad.hdr.status = 0; packet->mad.hdr.length = hdr_size(file) + mad_recv_wc->mad_len; packet->mad.hdr.qpn = cpu_to_be32(mad_recv_wc->wc->src_qp); - packet->mad.hdr.lid = ib_lid_be16(mad_recv_wc->wc->slid); + /* + * On OPA devices it is okay to lose the upper 16 bits of LID as this + * information is obtained elsewhere. Mask off the upper 16 bits. + */ + if (agent->device->port_immutable[agent->port_num].core_cap_flags & + RDMA_CORE_PORT_INTEL_OPA) + packet->mad.hdr.lid = ib_lid_be16(0xFFFF & + mad_recv_wc->wc->slid); + else + packet->mad.hdr.lid = ib_lid_be16(mad_recv_wc->wc->slid); packet->mad.hdr.sl = mad_recv_wc->wc->sl; packet->mad.hdr.path_bits = mad_recv_wc->wc->dlid_path_bits; packet->mad.hdr.pkey_index = mad_recv_wc->wc->pkey_index; From c48f3c04984557e34cf2b7194eb4d7947edccc5e Mon Sep 17 00:00:00 2001 From: Don Hiatt Date: Mon, 2 Oct 2017 11:04:55 -0700 Subject: [PATCH 0086/1013] IB/hfi1: Do not warn on lid conversions for OPA commit 4988be5813ff2afdc0d8bfa315ef34a577d3efbf upstream. On OPA devices opa_local_smp_check will receive 32Bit LIDs when the LID is Extended. In such cases, it is okay to lose the upper 16 bits of the LID as this information is obtained elsewhere. Do not issue a warning when calling ib_lid_cpu16() in this case by masking out the upper 16Bits. [75920.148985] ------------[ cut here ]------------ [75920.154651] WARNING: CPU: 0 PID: 1718 at ./include/rdma/ib_verbs.h:3788 hfi1_process_mad+0x1c1f/0x1c80 [hfi1] [75920.166192] Modules linked in: ib_ipoib hfi1(E) rdmavt(E) rdma_ucm(E) ib_ucm(E) rdma_cm(E) ib_cm(E) iw_cm(E) ib_umad(E) ib_uverbs(E) ib_core(E) libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod dax x86_pkg_temp_thermal intel_powerclamp coretemp kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel mei_me ipmi_si iTCO_wdt iTCO_vendor_support crypto_simd ipmi_devintf pcspkr mei sg i2c_i801 glue_helper lpc_ich shpchp ioatdma mfd_core wmi ipmi_msghandler cryptd acpi_power_meter acpi_pad nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb ptp ahci libahci pps_core crc32c_intel libata dca i2c_algo_bit i2c_core [last unloaded: ib_core] [75920.246331] CPU: 0 PID: 1718 Comm: kworker/0:1H Tainted: G W I E 4.13.0-rc7+ #1 [75920.255907] Hardware name: Intel Corporation S2600WT2/S2600WT2, BIOS SE5C610.86B.01.01.0008.021120151325 02/11/2015 [75920.268158] Workqueue: ib-comp-wq ib_cq_poll_work [ib_core] [75920.274934] task: ffff88084a718000 task.stack: ffffc9000a424000 [75920.282123] RIP: 0010:hfi1_process_mad+0x1c1f/0x1c80 [hfi1] [75920.288881] RSP: 0018:ffffc9000a427c38 EFLAGS: 00010206 [75920.295265] RAX: 0000000000010001 RBX: ffff8808361420e8 RCX: ffff880837811d80 [75920.303784] RDX: 0000000000000002 RSI: 0000000000007fff RDI: ffff880837811d80 [75920.312302] RBP: ffffc9000a427d38 R08: 0000000000000000 R09: ffff8808361420e8 [75920.320819] R10: ffff88083841f0e8 R11: ffffc9000a427da8 R12: 0000000000000001 [75920.329335] R13: ffff880837810000 R14: 0000000000000000 R15: ffff88084f1a4800 [75920.337849] FS: 0000000000000000(0000) GS:ffff88085f400000(0000) knlGS:0000000000000000 [75920.347450] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [75920.354405] CR2: 00007f9e4b3d9000 CR3: 0000000001c09000 CR4: 00000000001406f0 [75920.362947] Call Trace: [75920.366257] ? ib_mad_recv_done+0x258/0x9b0 [ib_core] [75920.372457] ? ib_mad_recv_done+0x258/0x9b0 [ib_core] [75920.378652] ? __kmalloc+0x1df/0x210 [75920.383229] ib_mad_recv_done+0x305/0x9b0 [ib_core] [75920.389270] __ib_process_cq+0x5d/0xb0 [ib_core] [75920.395032] ib_cq_poll_work+0x20/0x60 [ib_core] [75920.400777] process_one_work+0x149/0x360 [75920.405836] worker_thread+0x4d/0x3c0 [75920.410505] kthread+0x109/0x140 [75920.414681] ? rescuer_thread+0x380/0x380 [75920.419731] ? kthread_park+0x60/0x60 [75920.424406] ret_from_fork+0x25/0x30 [75920.428972] Code: 4c 89 9d 58 ff ff ff 49 89 45 00 66 b8 00 02 49 89 45 08 e8 44 27 89 e0 4c 8b 9d 58 ff ff ff e9 d8 f6 ff ff 0f ff e9 55 e7 ff ff <0f> ff e9 3b e5 ff ff 0f ff 0f 1f 84 00 00 00 00 00 e9 4b e9 ff [75921.451269] ---[ end trace cf26df27c9597265 ]--- Fixes: 62ede7779904 ("Add OPA extended LID support") Reviewed-by: Mike Marciniszyn Signed-off-by: Don Hiatt Signed-off-by: Dennis Dalessandro Reviewed-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/mad.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/mad.c b/drivers/infiniband/hw/hfi1/mad.c index f4c0ffc040cc..07b80faf1675 100644 --- a/drivers/infiniband/hw/hfi1/mad.c +++ b/drivers/infiniband/hw/hfi1/mad.c @@ -4293,7 +4293,6 @@ static int opa_local_smp_check(struct hfi1_ibport *ibp, const struct ib_wc *in_wc) { struct hfi1_pportdata *ppd = ppd_from_ibp(ibp); - u16 slid = ib_lid_cpu16(in_wc->slid); u16 pkey; if (in_wc->pkey_index >= ARRAY_SIZE(ppd->pkeys)) @@ -4320,7 +4319,11 @@ static int opa_local_smp_check(struct hfi1_ibport *ibp, */ if (pkey == LIM_MGMT_P_KEY || pkey == FULL_MGMT_P_KEY) return 0; - ingress_pkey_table_fail(ppd, pkey, slid); + /* + * On OPA devices it is okay to lose the upper 16 bits of LID as this + * information is obtained elsewhere. Mask off the upper 16 bits. + */ + ingress_pkey_table_fail(ppd, pkey, ib_lid_cpu16(0xFFFF & in_wc->slid)); return 1; } From 635526b896ca34fff5487d3ac00d04afbee7e1e4 Mon Sep 17 00:00:00 2001 From: Sasha Neftin Date: Mon, 6 Nov 2017 08:31:59 +0200 Subject: [PATCH 0087/1013] e1000e: fix the use of magic numbers for buffer overrun issue commit c0f4b163a03e73055dd734eaca64b9580e72e7fb upstream. This is a follow on to commit b10effb92e27 ("fix buffer overrun while the I219 is processing DMA transactions") to address David Laights concerns about the use of "magic" numbers. So define masks as well as add additional code comments to give a better understanding of what needs to be done to avoid a buffer overrun. Signed-off-by: Sasha Neftin Reviewed-by: Alexander H Duyck Reviewed-by: Dima Ruinskiy Reviewed-by: Raanan Avargil Tested-by: Aaron Brown Signed-off-by: Jeff Kirsher Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 ++- drivers/net/ethernet/intel/e1000e/netdev.c | 9 ++++++--- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.h b/drivers/net/ethernet/intel/e1000e/ich8lan.h index 67163ca898ba..00a36df02a3f 100644 --- a/drivers/net/ethernet/intel/e1000e/ich8lan.h +++ b/drivers/net/ethernet/intel/e1000e/ich8lan.h @@ -113,7 +113,8 @@ #define NVM_SIZE_MULTIPLIER 4096 /*multiplier for NVMS field */ #define E1000_FLASH_BASE_ADDR 0xE000 /*offset of NVM access regs */ #define E1000_CTRL_EXT_NVMVS 0x3 /*NVM valid sector */ -#define E1000_TARC0_CB_MULTIQ_3_REQ (1 << 28 | 1 << 29) +#define E1000_TARC0_CB_MULTIQ_3_REQ 0x30000000 +#define E1000_TARC0_CB_MULTIQ_2_REQ 0x20000000 #define PCIE_ICH8_SNOOP_ALL PCIE_NO_SNOOP_ALL #define E1000_ICH_RAR_ENTRIES 7 diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index c38b00c90f48..991c2a0dd67e 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -3030,9 +3030,12 @@ static void e1000_configure_tx(struct e1000_adapter *adapter) ew32(IOSFPC, reg_val); reg_val = er32(TARC(0)); - /* SPT and KBL Si errata workaround to avoid Tx hang */ - reg_val &= ~BIT(28); - reg_val |= BIT(29); + /* SPT and KBL Si errata workaround to avoid Tx hang. + * Dropping the number of outstanding requests from + * 3 to 2 in order to avoid a buffer overrun. + */ + reg_val &= ~E1000_TARC0_CB_MULTIQ_3_REQ; + reg_val |= E1000_TARC0_CB_MULTIQ_2_REQ; ew32(TARC(0), reg_val); } } From a67936842ba66eb229e6322856fefd2310e0b3fb Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 17 Oct 2017 14:24:09 +1100 Subject: [PATCH 0088/1013] md: forbid a RAID5 from having both a bitmap and a journal. commit 230b55fa8d64007339319539f8f8e68114d08529 upstream. Having both a bitmap and a journal is pointless. Attempting to do so can corrupt the bitmap if the journal replay happens before the bitmap is initialized. Rather than try to avoid this corruption, simply refuse to allow arrays with both a bitmap and a journal. So: - if raid5_run sees both are present, fail. - if adding a bitmap finds a journal is present, fail - if adding a journal finds a bitmap is present, fail. Signed-off-by: NeilBrown Tested-by: Joshua Kinard Acked-by: Joshua Kinard Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman --- drivers/md/bitmap.c | 6 ++++++ drivers/md/md.c | 2 +- drivers/md/raid5.c | 7 +++++++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index cae57b5be817..f425905c97fa 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -1816,6 +1816,12 @@ struct bitmap *bitmap_create(struct mddev *mddev, int slot) BUG_ON(file && mddev->bitmap_info.offset); + if (test_bit(MD_HAS_JOURNAL, &mddev->flags)) { + pr_notice("md/raid:%s: array with journal cannot have bitmap\n", + mdname(mddev)); + return ERR_PTR(-EBUSY); + } + bitmap = kzalloc(sizeof(*bitmap), GFP_KERNEL); if (!bitmap) return ERR_PTR(-ENOMEM); diff --git a/drivers/md/md.c b/drivers/md/md.c index e019cf8c0d13..98ea86309ceb 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -6362,7 +6362,7 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info) break; } } - if (has_journal) { + if (has_journal || mddev->bitmap) { export_rdev(rdev); return -EBUSY; } diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 928e24a07133..7aed69a4f655 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -7156,6 +7156,13 @@ static int raid5_run(struct mddev *mddev) min_offset_diff = diff; } + if ((test_bit(MD_HAS_JOURNAL, &mddev->flags) || journal_dev) && + (mddev->bitmap_info.offset || mddev->bitmap_info.file)) { + pr_notice("md/raid:%s: array cannot have both journal and bitmap\n", + mdname(mddev)); + return -EINVAL; + } + if (mddev->reshape_position != MaxSector) { /* Check that we can continue the reshape. * Difficulties arise if the stripe we would write to From 6613dc721fc232698aafd0f6e214137cfda61e35 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Fri, 10 Nov 2017 16:03:01 +0100 Subject: [PATCH 0089/1013] drm/i915: Fix false-positive assert_rpm_wakelock_held in i915_pmic_bus_access_notifier v2 commit f4359cedfb43b934f38c50d1604db21333abe57b upstream. assert_rpm_wakelock_held is triggered from i915_pmic_bus_access_notifier even though it gets unregistered on (runtime) suspend, this is caused by a race happening under the following circumstances: intel_runtime_pm_put does: atomic_dec(&dev_priv->pm.wakeref_count); pm_runtime_mark_last_busy(kdev); pm_runtime_put_autosuspend(kdev); And pm_runtime_put_autosuspend calls intel_runtime_suspend from a workqueue, so there is ample of time between the atomic_dec() and intel_runtime_suspend() unregistering the notifier. If the notifier gets called in this windowd assert_rpm_wakelock_held falsely triggers (at this point we're not runtime-suspended yet). This commit adds disable_rpm_wakeref_asserts and enable_rpm_wakeref_asserts calls around the intel_uncore_forcewake_get(FORCEWAKE_ALL) call in i915_pmic_bus_access_notifier fixing the false-positive WARN_ON. Changes in v2: -Reword comment explaining why disabling the wakeref asserts is ok and necessary Reported-by: FKr Reviewed-by: Imre Deak Signed-off-by: Hans de Goede Link: https://patchwork.freedesktop.org/patch/msgid/20171110150301.9601-2-hdegoede@redhat.com (cherry picked from commit ce30560c80dead91e98a03d90fb8791e57a9b69d) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/intel_uncore.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 1d7b879cc68c..2d3aad319229 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -1171,8 +1171,15 @@ static int i915_pmic_bus_access_notifier(struct notifier_block *nb, * bus, which will be busy after this notification, leading to: * "render: timed out waiting for forcewake ack request." * errors. + * + * The notifier is unregistered during intel_runtime_suspend(), + * so it's ok to access the HW here without holding a RPM + * wake reference -> disable wakeref asserts for the time of + * the access. */ + disable_rpm_wakeref_asserts(dev_priv); intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL); + enable_rpm_wakeref_asserts(dev_priv); break; case MBI_PMIC_BUS_ACCESS_END: intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL); From 569e3d1de49f0896733de8e709fc1fd28034d7dc Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Tue, 14 Nov 2017 14:55:17 +0100 Subject: [PATCH 0090/1013] drm/i915: Re-register PMIC bus access notifier on runtime resume commit 294cf1af8cf2eb0d1eced377fdfb9a2d3f0e8b42 upstream. intel_uncore_suspend() unregisters the uncore code's PMIC bus access notifier and gets called on both normal and runtime suspend. intel_uncore_resume_early() re-registers the notifier, but only on normal resume. Add a new intel_uncore_runtime_resume() function which only re-registers the notifier and call that on runtime resume. Reported-by: Imre Deak Reviewed-by: Imre Deak Signed-off-by: Hans de Goede Link: https://patchwork.freedesktop.org/patch/msgid/20171114135518.15981-2-hdegoede@redhat.com (cherry picked from commit bedf4d79c3654921839b62246b0965ddb308b201) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/i915_drv.c | 2 ++ drivers/gpu/drm/i915/intel_uncore.c | 6 ++++++ drivers/gpu/drm/i915/intel_uncore.h | 1 + 3 files changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 9f45cfeae775..82498f8232eb 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -2591,6 +2591,8 @@ static int intel_runtime_resume(struct device *kdev) ret = vlv_resume_prepare(dev_priv, true); } + intel_uncore_runtime_resume(dev_priv); + /* * No point of rolling back things in case of an error, as the best * we can do is to hope that things will still work (and disable RPM). diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c index 2d3aad319229..e9ed02518406 100644 --- a/drivers/gpu/drm/i915/intel_uncore.c +++ b/drivers/gpu/drm/i915/intel_uncore.c @@ -434,6 +434,12 @@ void intel_uncore_resume_early(struct drm_i915_private *dev_priv) i915_check_and_clear_faults(dev_priv); } +void intel_uncore_runtime_resume(struct drm_i915_private *dev_priv) +{ + iosf_mbi_register_pmic_bus_access_notifier( + &dev_priv->uncore.pmic_bus_access_nb); +} + void intel_uncore_sanitize(struct drm_i915_private *dev_priv) { i915.enable_rc6 = sanitize_rc6_option(dev_priv, i915.enable_rc6); diff --git a/drivers/gpu/drm/i915/intel_uncore.h b/drivers/gpu/drm/i915/intel_uncore.h index 5f90278da461..0bdc3fcc0e64 100644 --- a/drivers/gpu/drm/i915/intel_uncore.h +++ b/drivers/gpu/drm/i915/intel_uncore.h @@ -121,6 +121,7 @@ bool intel_uncore_arm_unclaimed_mmio_detection(struct drm_i915_private *dev_priv void intel_uncore_fini(struct drm_i915_private *dev_priv); void intel_uncore_suspend(struct drm_i915_private *dev_priv); void intel_uncore_resume_early(struct drm_i915_private *dev_priv); +void intel_uncore_runtime_resume(struct drm_i915_private *dev_priv); u64 intel_uncore_edram_size(struct drm_i915_private *dev_priv); void assert_forcewakes_inactive(struct drm_i915_private *dev_priv); From 7042c2b9a19e74972f5783ab3b7ef98ebdee293f Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Sat, 25 Nov 2017 19:41:55 +0000 Subject: [PATCH 0091/1013] drm/i915/fbdev: Serialise early hotplug events with async fbdev config commit a45b30a6c5db631e2ba680304bd5edd0cd1f9643 upstream. As both the hotplug event and fbdev configuration run asynchronously, it is possible for them to run concurrently. If configuration fails, we were freeing the fbdev causing a use-after-free in the hotplug event. <7>[ 3069.935211] [drm:intel_fb_initial_config [i915]] Not using firmware configuration <7>[ 3069.935225] [drm:drm_setup_crtcs] looking for cmdline mode on connector 77 <7>[ 3069.935229] [drm:drm_setup_crtcs] looking for preferred mode on connector 77 0 <7>[ 3069.935233] [drm:drm_setup_crtcs] found mode 3200x1800 <7>[ 3069.935236] [drm:drm_setup_crtcs] picking CRTCs for 8192x8192 config <7>[ 3069.935253] [drm:drm_setup_crtcs] desired mode 3200x1800 set on crtc 43 (0,0) <7>[ 3069.935323] [drm:intelfb_create [i915]] no BIOS fb, allocating a new one <4>[ 3069.967737] general protection fault: 0000 [#1] PREEMPT SMP <0>[ 3069.977453] --------------------------------- <4>[ 3069.977457] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm r8169 mei_me mii prime_numbers mei i2c_hid pinctrl_geminilake pinctrl_intel [last unloaded: i915] <4>[ 3069.977492] CPU: 1 PID: 15414 Comm: kworker/1:0 Tainted: G U 4.14.0-CI-CI_DRM_3388+ #1 <4>[ 3069.977497] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017 <4>[ 3069.977508] Workqueue: events output_poll_execute <4>[ 3069.977512] task: ffff880177734e40 task.stack: ffffc90001fe4000 <4>[ 3069.977519] RIP: 0010:__lock_acquire+0x109/0x1b60 <4>[ 3069.977523] RSP: 0018:ffffc90001fe7bb0 EFLAGS: 00010002 <4>[ 3069.977526] RAX: 6b6b6b6b6b6b6b6b RBX: 0000000000000282 RCX: 0000000000000000 <4>[ 3069.977530] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880170d4efd0 <4>[ 3069.977534] RBP: ffffc90001fe7c70 R08: 0000000000000001 R09: 0000000000000000 <4>[ 3069.977538] R10: 0000000000000000 R11: ffffffff81899609 R12: ffff880170d4efd0 <4>[ 3069.977542] R13: ffff880177734e40 R14: 0000000000000001 R15: 0000000000000000 <4>[ 3069.977547] FS: 0000000000000000(0000) GS:ffff88017fc80000(0000) knlGS:0000000000000000 <4>[ 3069.977551] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 3069.977555] CR2: 00007f7e8b7bcf04 CR3: 0000000003e0f000 CR4: 00000000003406e0 <4>[ 3069.977559] Call Trace: <4>[ 3069.977565] ? mark_held_locks+0x64/0x90 <4>[ 3069.977571] ? _raw_spin_unlock_irq+0x24/0x50 <4>[ 3069.977575] ? _raw_spin_unlock_irq+0x24/0x50 <4>[ 3069.977579] ? trace_hardirqs_on_caller+0xde/0x1c0 <4>[ 3069.977583] ? _raw_spin_unlock_irq+0x2f/0x50 <4>[ 3069.977588] ? finish_task_switch+0xa5/0x210 <4>[ 3069.977592] ? lock_acquire+0xaf/0x200 <4>[ 3069.977596] lock_acquire+0xaf/0x200 <4>[ 3069.977600] ? __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977604] _raw_spin_lock+0x2a/0x40 <4>[ 3069.977608] ? __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977612] __mutex_lock+0x5e9/0x9b0 <4>[ 3069.977616] ? drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977621] ? drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977625] drm_fb_helper_hotplug_event.part.19+0x16/0xa0 <4>[ 3069.977630] output_poll_execute+0x8d/0x180 <4>[ 3069.977635] process_one_work+0x22e/0x660 <4>[ 3069.977640] worker_thread+0x48/0x3a0 <4>[ 3069.977644] ? _raw_spin_unlock_irqrestore+0x4c/0x60 <4>[ 3069.977649] kthread+0x102/0x140 <4>[ 3069.977653] ? process_one_work+0x660/0x660 <4>[ 3069.977657] ? kthread_create_on_node+0x40/0x40 <4>[ 3069.977662] ret_from_fork+0x27/0x40 <4>[ 3069.977666] Code: 8d 62 f8 c3 49 81 3c 24 e0 fa 3c 82 41 be 00 00 00 00 45 0f 45 f0 83 fe 01 77 86 89 f0 49 8b 44 c4 08 48 85 c0 0f 84 76 ff ff ff ff 80 38 01 00 00 8b 1d 62 f9 e8 01 45 8b 85 b8 08 00 00 85 <1>[ 3069.977707] RIP: __lock_acquire+0x109/0x1b60 RSP: ffffc90001fe7bb0 <4>[ 3069.977712] ---[ end trace 4ad012eb3af62df7 ]--- In order to keep the dev_priv->ifbdev alive after failure, we have to avoid the free and leave it empty until we unload the module (which is less than ideal, but a necessary evil for simplicity). Then we can use intel_fbdev_sync() to serialise the hotplug event with the configuration. The serialisation between the two was removed in commit 934458c2c95d ("Revert "drm/i915: Fix races on fbdev""), but the use after free is much older, commit 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") Fixes: 366e39b4d2c5 ("drm/i915: Tear down fbdev if initialization fails") Fixes: 934458c2c95d ("Revert "drm/i915: Fix races on fbdev"") Signed-off-by: Chris Wilson Cc: Lukas Wunner Cc: Joonas Lahtinen Cc: Daniel Vetter Reviewed-by: Lukas Wunner Link: https://patchwork.freedesktop.org/patch/msgid/20171125194155.355-1-chris@chris-wilson.co.uk (cherry picked from commit ad88d7fc6c032ddfb32b8d496a070ab71de3a64f) Signed-off-by: Joonas Lahtinen Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/intel_fbdev.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index 262e75c00dd2..da2d309574ba 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -694,10 +694,8 @@ static void intel_fbdev_initial_config(void *data, async_cookie_t cookie) /* Due to peculiar init order wrt to hpd handling this is separate. */ if (drm_fb_helper_initial_config(&ifbdev->helper, - ifbdev->preferred_bpp)) { + ifbdev->preferred_bpp)) intel_fbdev_unregister(to_i915(ifbdev->helper.dev)); - intel_fbdev_fini(to_i915(ifbdev->helper.dev)); - } } void intel_fbdev_initial_config_async(struct drm_device *dev) @@ -797,7 +795,11 @@ void intel_fbdev_output_poll_changed(struct drm_device *dev) { struct intel_fbdev *ifbdev = to_i915(dev)->fbdev; - if (ifbdev) + if (!ifbdev) + return; + + intel_fbdev_sync(ifbdev); + if (ifbdev->vma) drm_fb_helper_hotplug_event(&ifbdev->helper); } From 6c0d3d1d5908f5cd5c838c99663238cac201953b Mon Sep 17 00:00:00 2001 From: Xiong Zhang Date: Tue, 28 Nov 2017 07:29:54 +0800 Subject: [PATCH 0092/1013] drm/i915/gvt: Correct ADDR_4K/2M/1G_MASK definition commit b721b65af4eb46df6a1d9e34b14003225e403565 upstream. For ADDR_4K_MASK, bit[45..12] should be 1, all other bits should be 0. The current definition wrongly set bit[46] as 1 also. This path fixes this. v2: Add commit message, fixes and cc stable.(Zhenyu) Fixes: 2707e4446688("drm/i915/gvt: vGPU graphics memory virtualization") Signed-off-by: Xiong Zhang Signed-off-by: Zhenyu Wang Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/gvt/gtt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/gtt.c b/drivers/gpu/drm/i915/gvt/gtt.c index e6dfc3331f4b..a385838e2919 100644 --- a/drivers/gpu/drm/i915/gvt/gtt.c +++ b/drivers/gpu/drm/i915/gvt/gtt.c @@ -311,9 +311,9 @@ static inline int gtt_set_entry64(void *pt, #define GTT_HAW 46 -#define ADDR_1G_MASK (((1UL << (GTT_HAW - 30 + 1)) - 1) << 30) -#define ADDR_2M_MASK (((1UL << (GTT_HAW - 21 + 1)) - 1) << 21) -#define ADDR_4K_MASK (((1UL << (GTT_HAW - 12 + 1)) - 1) << 12) +#define ADDR_1G_MASK (((1UL << (GTT_HAW - 30)) - 1) << 30) +#define ADDR_2M_MASK (((1UL << (GTT_HAW - 21)) - 1) << 21) +#define ADDR_4K_MASK (((1UL << (GTT_HAW - 12)) - 1) << 12) static unsigned long gen8_gtt_get_pfn(struct intel_gvt_gtt_entry *e) { From e522b9247ce931a7319f1d7e9d0bdfa5817b9a4e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Thu, 23 Nov 2017 21:41:56 +0200 Subject: [PATCH 0093/1013] drm/i915: Don't try indexed reads to alternate slave addresses MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit ae5c631e605a452a5a0e73205a92810c01ed954b upstream. We can only specify the one slave address to indexed reads/writes. Make sure the messages we check are destined to the same slave address before deciding to do an indexed transfer. Cc: Daniel Kurtz Cc: Chris Wilson Cc: Daniel Vetter Cc: Sean Paul Fixes: 56f9eac05489 ("drm/i915/intel_i2c: use INDEX cycles for i2c read transactions") Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20171123194157.25367-2-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson (cherry picked from commit c4deb62d7821672265b87952bcd1c808f3bf3e8f) Signed-off-by: Joonas Lahtinen Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/intel_i2c.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index eb5827110d8f..8affd47b98b8 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -438,6 +438,7 @@ static bool gmbus_is_index_read(struct i2c_msg *msgs, int i, int num) { return (i + 1 < num && + msgs[i].addr == msgs[i + 1].addr && !(msgs[i].flags & I2C_M_RD) && msgs[i].len <= 2 && (msgs[i + 1].flags & I2C_M_RD)); } From 88e1bb2a24125747d48b5e2d60bccdb247b86eb9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Thu, 23 Nov 2017 21:41:57 +0200 Subject: [PATCH 0094/1013] drm/i915: Prevent zero length "index" write MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 56350fb8978bbf4aafe08f21234e161dd128b417 upstream. The hardware always writes one or two bytes in the index portion of an indexed transfer. Make sure the message we send as the index doesn't have a zero length. Cc: Daniel Kurtz Cc: Chris Wilson Cc: Daniel Vetter Cc: Sean Paul Fixes: 56f9eac05489 ("drm/i915/intel_i2c: use INDEX cycles for i2c read transactions") Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20171123194157.25367-3-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson (cherry picked from commit bb9e0d4bca50f429152e74a459160b41f3d60fb2) Signed-off-by: Joonas Lahtinen Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/intel_i2c.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index 8affd47b98b8..49fdf09f9919 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -439,7 +439,8 @@ gmbus_is_index_read(struct i2c_msg *msgs, int i, int num) { return (i + 1 < num && msgs[i].addr == msgs[i + 1].addr && - !(msgs[i].flags & I2C_M_RD) && msgs[i].len <= 2 && + !(msgs[i].flags & I2C_M_RD) && + (msgs[i].len == 1 || msgs[i].len == 2) && (msgs[i + 1].flags & I2C_M_RD)); } From e49d722f26b904015077655bff47cdbde049226a Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 4 Dec 2017 12:59:57 +0100 Subject: [PATCH 0095/1013] Revert "x86/entry/64: Add missing irqflags tracing to native_load_gs_index()" This reverts commit f9a64e23a9da528e7d8aa1bd2c7bb92be4ebb724 which is commit 0d794d0d018f23fb09c50f6ae26868bd6ae343d6 upstream. Andy writes: I think the thing to do is to revert the patch from -stable. The bug it fixes is very minor, and the regression is that it made a pre-existing bug in some nearly-undebuggable core resume code much easier to hit. I don't feel comfortable with a backport of the latter fix until it has a good long soak in Linus' tree. Reported-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 518d9286b3d1..2e956afe272c 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -51,19 +51,15 @@ ENTRY(native_usergs_sysret64) END(native_usergs_sysret64) #endif /* CONFIG_PARAVIRT */ -.macro TRACE_IRQS_FLAGS flags:req +.macro TRACE_IRQS_IRETQ #ifdef CONFIG_TRACE_IRQFLAGS - bt $9, \flags /* interrupts off? */ + bt $9, EFLAGS(%rsp) /* interrupts off? */ jnc 1f TRACE_IRQS_ON 1: #endif .endm -.macro TRACE_IRQS_IRETQ - TRACE_IRQS_FLAGS EFLAGS(%rsp) -.endm - /* * When dynamic function tracer is enabled it will add a breakpoint * to all locations that it is about to modify, sync CPUs, update @@ -927,13 +923,11 @@ ENTRY(native_load_gs_index) FRAME_BEGIN pushfq DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI) - TRACE_IRQS_OFF SWAPGS .Lgs_change: movl %edi, %gs 2: ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE SWAPGS - TRACE_IRQS_FLAGS (%rsp) popfq FRAME_END ret From 51a2a68fde2035887c0d74aee1c9569c691dfd61 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 5 Dec 2017 11:26:38 +0100 Subject: [PATCH 0096/1013] Linux 4.14.4 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ede4de0d8634..ba1648c093fe 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 3 +SUBLEVEL = 4 EXTRAVERSION = NAME = Petit Gorille From c8111b1885d3a802e57f895e77aaa1e88de57282 Mon Sep 17 00:00:00 2001 From: Stefan Agner Date: Thu, 9 Nov 2017 15:39:56 +0100 Subject: [PATCH 0097/1013] drm/fsl-dcu: avoid disabling pixel clock twice on suspend commit 9306e996574f7f57136a62e49cd0075f85713623 upstream. With commit 0a70c998d0c5 ("drm/fsl-dcu: enable pixel clock when enabling CRTC") the pixel clock is controlled by the CRTC code. Disabling the pixel clock in suspend leads to a warning due to the second clk_disable_unprepare call: WARNING: CPU: 0 PID: 359 at drivers/clk/clk.c:594 clk_core_disable+0x8c/0x90 Remove clk_disable_unprepare call for pixel clock to avoid unbalanced clock disable on suspend. Fixes: 0a70c998d0c5 ("drm/fsl-dcu: enable pixel clock when enabling CRTC") Signed-off-by: Stefan Agner Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c index 58e9e0601a61..90ad509c7c45 100644 --- a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c +++ b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c @@ -210,7 +210,6 @@ static int fsl_dcu_drm_pm_suspend(struct device *dev) return PTR_ERR(fsl_dev->state); } - clk_disable_unprepare(fsl_dev->pix_clk); clk_disable_unprepare(fsl_dev->clk); return 0; From bacbe44889828852678bc74ad0e942a7684f206f Mon Sep 17 00:00:00 2001 From: Stefan Agner Date: Fri, 10 Nov 2017 10:15:28 +0100 Subject: [PATCH 0098/1013] drm/fsl-dcu: enable IRQ before drm_atomic_helper_resume() commit 9fd99f4f3f5e13ce959900ae57d64b1bdb51d823 upstream. The resume helpers wait for a vblank to occurre hence IRQ need to be enabled. This avoids a warning as follows during resume: WARNING: CPU: 0 PID: 314 at drivers/gpu/drm/drm_atomic_helper.c:1249 drm_atomic_helper_wait_for_vblanks.part.1+0x284/0x288 [CRTC:28:crtc-0] vblank wait timed out Signed-off-by: Stefan Agner Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c index 90ad509c7c45..faf17b83b910 100644 --- a/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c +++ b/drivers/gpu/drm/fsl-dcu/fsl_dcu_drm_drv.c @@ -232,6 +232,7 @@ static int fsl_dcu_drm_pm_resume(struct device *dev) if (fsl_dev->tcon) fsl_tcon_bypass_enable(fsl_dev->tcon); fsl_dcu_drm_init_planes(fsl_dev->drm); + enable_irq(fsl_dev->irq); drm_atomic_helper_resume(fsl_dev->drm, fsl_dev->state); console_lock(); @@ -239,7 +240,6 @@ static int fsl_dcu_drm_pm_resume(struct device *dev) console_unlock(); drm_kms_helper_poll_enable(fsl_dev->drm); - enable_irq(fsl_dev->irq); return 0; } From 955c907b978589fbd34e9eaaf53527d8cbf589b4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michel=20D=C3=A4nzer?= Date: Wed, 22 Nov 2017 15:55:21 +0100 Subject: [PATCH 0099/1013] drm/amdgpu: Use unsigned ring indices in amdgpu_queue_mgr_map MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit fa7c7939b4bf112cd06ba166b739244077898990 upstream. This matches the corresponding UAPI fields. Treating the ring index as signed could result in accessing random unrelated memory if the MSB was set. Fixes: effd924d2f3b ("drm/amdgpu: untie user ring ids from kernel ring ids v6") Reviewed-by: Alex Deucher Reviewed-by: Christian König Signed-off-by: Michel Dänzer Signed-off-by: Alex Deucher Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 87801faaf264..712ad8c2bdc5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -697,7 +697,7 @@ int amdgpu_queue_mgr_fini(struct amdgpu_device *adev, struct amdgpu_queue_mgr *mgr); int amdgpu_queue_mgr_map(struct amdgpu_device *adev, struct amdgpu_queue_mgr *mgr, - int hw_ip, int instance, int ring, + u32 hw_ip, u32 instance, u32 ring, struct amdgpu_ring **out_ring); /* diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c index befc09b68543..b293380bd46c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c @@ -63,7 +63,7 @@ static int amdgpu_update_cached_map(struct amdgpu_queue_mapper *mapper, static int amdgpu_identity_map(struct amdgpu_device *adev, struct amdgpu_queue_mapper *mapper, - int ring, + u32 ring, struct amdgpu_ring **out_ring) { switch (mapper->hw_ip) { @@ -121,7 +121,7 @@ static enum amdgpu_ring_type amdgpu_hw_ip_to_ring_type(int hw_ip) static int amdgpu_lru_map(struct amdgpu_device *adev, struct amdgpu_queue_mapper *mapper, - int user_ring, + u32 user_ring, struct amdgpu_ring **out_ring) { int r, i, j; @@ -208,7 +208,7 @@ int amdgpu_queue_mgr_fini(struct amdgpu_device *adev, */ int amdgpu_queue_mgr_map(struct amdgpu_device *adev, struct amdgpu_queue_mgr *mgr, - int hw_ip, int instance, int ring, + u32 hw_ip, u32 instance, u32 ring, struct amdgpu_ring **out_ring) { int r, ip_num_rings; From b8e212c599082896a180a18a0c9bd529526590be Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 11 Sep 2017 11:24:22 +0200 Subject: [PATCH 0100/1013] s390/runtime instrumentation: simplify task exit handling commit 8d9047f8b967ce6181fd824ae922978e1b055cc0 upstream. Free data structures required for runtime instrumentation from arch_release_task_struct(). This allows to simplify the code a bit, and also makes the semantics a bit easier: arch_release_task_struct() is never called from the task that is being removed. In addition this allows to get rid of exit_thread() in a later patch. Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Cc: Ben Hutchings Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/runtime_instr.h | 4 +++- arch/s390/kernel/process.c | 5 ++--- arch/s390/kernel/runtime_instr.c | 30 +++++++++++++-------------- 3 files changed, 20 insertions(+), 19 deletions(-) diff --git a/arch/s390/include/asm/runtime_instr.h b/arch/s390/include/asm/runtime_instr.h index ea8896ba5afc..2502d05403ef 100644 --- a/arch/s390/include/asm/runtime_instr.h +++ b/arch/s390/include/asm/runtime_instr.h @@ -86,6 +86,8 @@ static inline void restore_ri_cb(struct runtime_instr_cb *cb_next, load_runtime_instr_cb(&runtime_instr_empty_cb); } -void exit_thread_runtime_instr(void); +struct task_struct; + +void runtime_instr_release(struct task_struct *tsk); #endif /* _RUNTIME_INSTR_H */ diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c index 203b7cd7c348..7d4c5500c6c2 100644 --- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -49,10 +49,8 @@ extern void kernel_thread_starter(void); */ void exit_thread(struct task_struct *tsk) { - if (tsk == current) { - exit_thread_runtime_instr(); + if (tsk == current) exit_thread_gs(); - } } void flush_thread(void) @@ -65,6 +63,7 @@ void release_thread(struct task_struct *dead_task) void arch_release_task_struct(struct task_struct *tsk) { + runtime_instr_release(tsk); } int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) diff --git a/arch/s390/kernel/runtime_instr.c b/arch/s390/kernel/runtime_instr.c index d85c64821a6b..94c9ba72cf83 100644 --- a/arch/s390/kernel/runtime_instr.c +++ b/arch/s390/kernel/runtime_instr.c @@ -21,11 +21,24 @@ /* empty control block to disable RI by loading it */ struct runtime_instr_cb runtime_instr_empty_cb; +void runtime_instr_release(struct task_struct *tsk) +{ + kfree(tsk->thread.ri_cb); +} + static void disable_runtime_instr(void) { - struct pt_regs *regs = task_pt_regs(current); + struct task_struct *task = current; + struct pt_regs *regs; + if (!task->thread.ri_cb) + return; + regs = task_pt_regs(task); + preempt_disable(); load_runtime_instr_cb(&runtime_instr_empty_cb); + kfree(task->thread.ri_cb); + task->thread.ri_cb = NULL; + preempt_enable(); /* * Make sure the RI bit is deleted from the PSW. If the user did not @@ -46,19 +59,6 @@ static void init_runtime_instr_cb(struct runtime_instr_cb *cb) cb->valid = 1; } -void exit_thread_runtime_instr(void) -{ - struct task_struct *task = current; - - preempt_disable(); - if (!task->thread.ri_cb) - return; - disable_runtime_instr(); - kfree(task->thread.ri_cb); - task->thread.ri_cb = NULL; - preempt_enable(); -} - SYSCALL_DEFINE1(s390_runtime_instr, int, command) { struct runtime_instr_cb *cb; @@ -67,7 +67,7 @@ SYSCALL_DEFINE1(s390_runtime_instr, int, command) return -EOPNOTSUPP; if (command == S390_RUNTIME_INSTR_STOP) { - exit_thread_runtime_instr(); + disable_runtime_instr(); return 0; } From e0264c662ca8ba118715072927af08d96fd90a36 Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Wed, 29 Nov 2017 15:24:22 -0700 Subject: [PATCH 0101/1013] usbip: fix usbip attach to find a port that matches the requested speed commit 1ac7c8a78be85f84b019d3d2742d1a9f07255cc5 upstream. usbip attach fails to find a free port when the device on the first port is a USB_SPEED_SUPER device and non-super speed device is being attached. It keeps checking the first port and returns without a match getting stuck in a loop. Fix it check to find the first port with matching speed. Reported-by: Juan Zea Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- tools/usb/usbip/libsrc/vhci_driver.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/tools/usb/usbip/libsrc/vhci_driver.c b/tools/usb/usbip/libsrc/vhci_driver.c index 5727dfb15a83..8a1cd1616de4 100644 --- a/tools/usb/usbip/libsrc/vhci_driver.c +++ b/tools/usb/usbip/libsrc/vhci_driver.c @@ -329,9 +329,17 @@ int usbip_vhci_refresh_device_list(void) int usbip_vhci_get_free_port(uint32_t speed) { for (int i = 0; i < vhci_driver->nports; i++) { - if (speed == USB_SPEED_SUPER && - vhci_driver->idev[i].hub != HUB_SPEED_SUPER) - continue; + + switch (speed) { + case USB_SPEED_SUPER: + if (vhci_driver->idev[i].hub != HUB_SPEED_SUPER) + continue; + break; + default: + if (vhci_driver->idev[i].hub != HUB_SPEED_HIGH) + continue; + break; + } if (vhci_driver->idev[i].status == VDEV_ST_NULL) return vhci_driver->idev[i].port; From b76f8338123f06c565574a5f36964f482db829a3 Mon Sep 17 00:00:00 2001 From: Yuyang Du Date: Thu, 30 Nov 2017 10:22:40 +0800 Subject: [PATCH 0102/1013] usbip: Fix USB device hang due to wrong enabling of scatter-gather commit 770b2edece42fa55bbe7d4cbe53347a07b8968d4 upstream. The previous USB3 SuperSpeed enabling patches mistakenly enabled URB scatter-gather chaining, which is actually not supported by the VHCI HCD. This patch fixes that. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=197867 Fixes: 03cd00d538a6feb ("usbip: vhci-hcd: Set the vhci structure up to work") Reported-by: Juan Zea Signed-off-by: Yuyang Du Acked-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- drivers/usb/usbip/vhci_hcd.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c index 11b9a22799cc..1f0cf81cc145 100644 --- a/drivers/usb/usbip/vhci_hcd.c +++ b/drivers/usb/usbip/vhci_hcd.c @@ -1112,7 +1112,6 @@ static int hcd_name_to_id(const char *name) static int vhci_setup(struct usb_hcd *hcd) { struct vhci *vhci = *((void **)dev_get_platdata(hcd->self.controller)); - hcd->self.sg_tablesize = ~0; if (usb_hcd_is_primary_hcd(hcd)) { vhci->vhci_hcd_hs = hcd_to_vhci_hcd(hcd); vhci->vhci_hcd_hs->vhci = vhci; From b6e6e58814d5583bef5c75734324b27f77f70f1b Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Tue, 14 Nov 2017 19:27:22 +0100 Subject: [PATCH 0103/1013] uas: Always apply US_FL_NO_ATA_1X quirk to Seagate devices commit 7fee72d5e8f1e7b8d8212e28291b1a0243ecf2f1 upstream. We've been adding this as a quirk on a per device basis hoping that newer disk enclosures would do better, but that has not happened, so simply apply this quirk to all Seagate devices. Signed-off-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman --- drivers/usb/storage/uas-detect.h | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/usb/storage/uas-detect.h b/drivers/usb/storage/uas-detect.h index 1fcd758a961f..3734a25e09e5 100644 --- a/drivers/usb/storage/uas-detect.h +++ b/drivers/usb/storage/uas-detect.h @@ -112,6 +112,10 @@ static int uas_use_uas_driver(struct usb_interface *intf, } } + /* All Seagate disk enclosures have broken ATA pass-through support */ + if (le16_to_cpu(udev->descriptor.idVendor) == 0x0bc2) + flags |= US_FL_NO_ATA_1X; + usb_stor_adjust_quirks(udev, &flags); if (flags & US_FL_IGNORE_UAS) { From 502ae7582aacc85ffd38fa820e4c734a230ccc25 Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Tue, 14 Nov 2017 01:31:15 -0500 Subject: [PATCH 0104/1013] usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub commit e43a12f1793ae1fe006e26fe9327a8840a92233c upstream. KY-688 USB 3.1 Type-C Hub internally uses a Genesys Logic hub to connect to Realtek r8153. Similar to commit ("7496cfe5431f2 usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter"), no-lpm can make r8153 ethernet work. Signed-off-by: Kai-Heng Feng Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 37c418e581fb..50010282c010 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -151,6 +151,9 @@ static const struct usb_device_id usb_quirk_list[] = { /* appletouch */ { USB_DEVICE(0x05ac, 0x021a), .driver_info = USB_QUIRK_RESET_RESUME }, + /* Genesys Logic hub, internally used by KY-688 USB 3.1 Type-C Hub */ + { USB_DEVICE(0x05e3, 0x0612), .driver_info = USB_QUIRK_NO_LPM }, + /* Genesys Logic hub, internally used by Moshi USB to Ethernet Adapter */ { USB_DEVICE(0x05e3, 0x0616), .driver_info = USB_QUIRK_NO_LPM }, From 45c17d9b6ea66925c8aadbb2973c48d668f673fc Mon Sep 17 00:00:00 2001 From: Matt Wilson Date: Mon, 13 Nov 2017 11:31:31 -0800 Subject: [PATCH 0105/1013] serial: 8250_pci: Add Amazon PCI serial device ID commit 3bfd1300abfe3adb18e84a89d97a0e82a22124bb upstream. This device will be used in future Amazon EC2 instances as the primary serial port (i.e., data sent to this port will be available via the GetConsoleOuput [1] EC2 API). [1] http://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_GetConsoleOutput.html Signed-off-by: Matt Wilson Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/tty/serial/8250/8250_pci.c b/drivers/tty/serial/8250/8250_pci.c index 0c101a7470b0..d4e7be88e0da 100644 --- a/drivers/tty/serial/8250/8250_pci.c +++ b/drivers/tty/serial/8250/8250_pci.c @@ -5137,6 +5137,9 @@ static const struct pci_device_id serial_pci_tbl[] = { { PCI_DEVICE(0x1601, 0x0800), .driver_data = pbn_b0_4_1250000 }, { PCI_DEVICE(0x1601, 0xa801), .driver_data = pbn_b0_4_1250000 }, + /* Amazon PCI serial device */ + { PCI_DEVICE(0x1d0f, 0x8250), .driver_data = pbn_b0_1_115200 }, + /* * These entries match devices with class COMMUNICATION_SERIAL, * COMMUNICATION_MODEM or COMMUNICATION_MULTISERIAL From 22ad53793edbc73a5a12f6960749bc967551fde4 Mon Sep 17 00:00:00 2001 From: Martijn Coenen Date: Mon, 13 Nov 2017 10:06:08 +0100 Subject: [PATCH 0106/1013] ANDROID: binder: fix transaction leak. commit fb2c445277e7b0b4ffe10de8114bad4eccaca948 upstream. If a call to put_user() fails, we failed to properly free a transaction and send a failed reply (if necessary). Signed-off-by: Martijn Coenen Signed-off-by: Greg Kroah-Hartman --- drivers/android/binder.c | 40 +++++++++++++++++++++++++++++++--------- 1 file changed, 31 insertions(+), 9 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index fddf76ef5bd6..88b4bbe58100 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -1947,6 +1947,26 @@ static void binder_send_failed_reply(struct binder_transaction *t, } } +/** + * binder_cleanup_transaction() - cleans up undelivered transaction + * @t: transaction that needs to be cleaned up + * @reason: reason the transaction wasn't delivered + * @error_code: error to return to caller (if synchronous call) + */ +static void binder_cleanup_transaction(struct binder_transaction *t, + const char *reason, + uint32_t error_code) +{ + if (t->buffer->target_node && !(t->flags & TF_ONE_WAY)) { + binder_send_failed_reply(t, error_code); + } else { + binder_debug(BINDER_DEBUG_DEAD_TRANSACTION, + "undelivered transaction %d, %s\n", + t->debug_id, reason); + binder_free_transaction(t); + } +} + /** * binder_validate_object() - checks for a valid metadata object in a buffer. * @buffer: binder_buffer that we're parsing. @@ -4015,12 +4035,20 @@ static int binder_thread_read(struct binder_proc *proc, if (put_user(cmd, (uint32_t __user *)ptr)) { if (t_from) binder_thread_dec_tmpref(t_from); + + binder_cleanup_transaction(t, "put_user failed", + BR_FAILED_REPLY); + return -EFAULT; } ptr += sizeof(uint32_t); if (copy_to_user(ptr, &tr, sizeof(tr))) { if (t_from) binder_thread_dec_tmpref(t_from); + + binder_cleanup_transaction(t, "copy_to_user failed", + BR_FAILED_REPLY); + return -EFAULT; } ptr += sizeof(tr); @@ -4090,15 +4118,9 @@ static void binder_release_work(struct binder_proc *proc, struct binder_transaction *t; t = container_of(w, struct binder_transaction, work); - if (t->buffer->target_node && - !(t->flags & TF_ONE_WAY)) { - binder_send_failed_reply(t, BR_DEAD_REPLY); - } else { - binder_debug(BINDER_DEBUG_DEAD_TRANSACTION, - "undelivered transaction %d\n", - t->debug_id); - binder_free_transaction(t); - } + + binder_cleanup_transaction(t, "process died.", + BR_DEAD_REPLY); } break; case BINDER_WORK_RETURN_ERROR: { struct binder_error *e = container_of( From b200d9899afe672d66c0d8d618c63455d21913ea Mon Sep 17 00:00:00 2001 From: Sebastian Sjoholm Date: Mon, 20 Nov 2017 19:29:32 +0100 Subject: [PATCH 0107/1013] USB: serial: option: add Quectel BG96 id commit c654b21ede93845863597de9ad774fd30db5f2ab upstream. Quectel BG96 is an Qualcomm MDM9206 based IoT modem, supporting both CAT-M and NB-IoT. Tested hardware is BG96 mounted on Quectel development board (EVB). The USB id is added to option.c to allow DIAG,GPS,AT and modem communication with the BG96. Signed-off-by: Sebastian Sjoholm Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index ba672cf4e888..54e316b1892d 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -241,6 +241,7 @@ static void option_instat_callback(struct urb *urb); /* These Quectel products use Quectel's vendor ID */ #define QUECTEL_PRODUCT_EC21 0x0121 #define QUECTEL_PRODUCT_EC25 0x0125 +#define QUECTEL_PRODUCT_BG96 0x0296 #define CMOTECH_VENDOR_ID 0x16d8 #define CMOTECH_PRODUCT_6001 0x6001 @@ -1185,6 +1186,8 @@ static const struct usb_device_id option_ids[] = { .driver_info = (kernel_ulong_t)&net_intf4_blacklist }, { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC25), .driver_info = (kernel_ulong_t)&net_intf4_blacklist }, + { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_BG96), + .driver_info = (kernel_ulong_t)&net_intf4_blacklist }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6001) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_CMU_300) }, { USB_DEVICE(CMOTECH_VENDOR_ID, CMOTECH_PRODUCT_6003), From 2a53dae2e338bcd32d8fac11ee87c01ab873fec3 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 28 Nov 2017 12:40:59 +0800 Subject: [PATCH 0108/1013] USB: serial: usb_debug: add new USB device id commit 762ff4678e89a5e3f8b2237533e04d3ef2737e78 upstream. USB vendor id and product id for Linux USB Debug Target is added. Signed-off-by: Lu Baolu Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/usb_debug.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/usb/serial/usb_debug.c b/drivers/usb/serial/usb_debug.c index 48f285a1ad00..c593ca8800e5 100644 --- a/drivers/usb/serial/usb_debug.c +++ b/drivers/usb/serial/usb_debug.c @@ -34,12 +34,14 @@ static const struct usb_device_id id_table[] = { }; static const struct usb_device_id dbc_id_table[] = { + { USB_DEVICE(0x1d6b, 0x0010) }, { USB_DEVICE(0x1d6b, 0x0011) }, { }, }; static const struct usb_device_id id_table_combined[] = { { USB_DEVICE(0x0525, 0x127a) }, + { USB_DEVICE(0x1d6b, 0x0010) }, { USB_DEVICE(0x1d6b, 0x0011) }, { }, }; From 64dbff3359fc3865a000d051143ade71aea1242b Mon Sep 17 00:00:00 2001 From: Matt Redfearn Date: Wed, 22 Nov 2017 09:57:28 +0000 Subject: [PATCH 0109/1013] serial: 8250_early: Only set divisor if valid clk & baud commit 0ff3ab701963a845d52337ded7eebf2d1a14fe00 upstream. If either uartclk or baud are 0, avoid calculating and setting a divisor based on them since the output will almost certainly be garbage. This also allows platforms such as the MIPS generic kernel, which has no way to know a valid BASE_BASE for the board it is actually booted on at compile time, to set BASE_BAUD to 0 and avoid early_8250 setting a bad divisor. This fixes a regression caused by commit 31cb9a8575ca ("earlycon: initialise baud field of earlycon device structure"), which changed the behavior of of_setup_earlycon such that it sets a baud rate in the earlycon structure where previously it was left as 0. All boards supported by the MIPS generic kernel started outputting garbage from the boot console due to an incorrect divisor being set. Fixes: 31cb9a8575ca ("earlycon: initialise baud field of earlycon device structure") Signed-off-by: Matt Redfearn Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_early.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/tty/serial/8250/8250_early.c b/drivers/tty/serial/8250/8250_early.c index af72ec32e404..f135c1846477 100644 --- a/drivers/tty/serial/8250/8250_early.c +++ b/drivers/tty/serial/8250/8250_early.c @@ -125,12 +125,14 @@ static void __init init_port(struct earlycon_device *device) serial8250_early_out(port, UART_FCR, 0); /* no fifo */ serial8250_early_out(port, UART_MCR, 0x3); /* DTR + RTS */ - divisor = DIV_ROUND_CLOSEST(port->uartclk, 16 * device->baud); - c = serial8250_early_in(port, UART_LCR); - serial8250_early_out(port, UART_LCR, c | UART_LCR_DLAB); - serial8250_early_out(port, UART_DLL, divisor & 0xff); - serial8250_early_out(port, UART_DLM, (divisor >> 8) & 0xff); - serial8250_early_out(port, UART_LCR, c & ~UART_LCR_DLAB); + if (port->uartclk && device->baud) { + divisor = DIV_ROUND_CLOSEST(port->uartclk, 16 * device->baud); + c = serial8250_early_in(port, UART_LCR); + serial8250_early_out(port, UART_LCR, c | UART_LCR_DLAB); + serial8250_early_out(port, UART_DLL, divisor & 0xff); + serial8250_early_out(port, UART_DLM, (divisor >> 8) & 0xff); + serial8250_early_out(port, UART_LCR, c & ~UART_LCR_DLAB); + } } int __init early_serial8250_setup(struct earlycon_device *device, From 07c925116820b3def5610a917cad3901d69d18d9 Mon Sep 17 00:00:00 2001 From: Matt Redfearn Date: Tue, 28 Nov 2017 15:22:20 +0000 Subject: [PATCH 0110/1013] MIPS: Add custom serial.h with BASE_BAUD override for generic kernel commit c8ec2041f549e7f2dee0c34d25381be6f7805f99 upstream. Add a custom serial.h header for MIPS, allowing platforms to override the asm-generic version if required. The generic platform uses this header to set BASE_BAUD to 0. The generic platform supports multiple boards, which may have different UART clocks. Also one of the boards supported is the Boston FPGA board, where the UART clock depends on the loaded FPGA bitfile. As such there is no way that the generic kernel can set a compile time default BASE_BAUD. Commit 31cb9a8575ca ("earlycon: initialise baud field of earlycon device structure") changed the behavior of of_setup_earlycon such that any baud rate set in the device tree is now set in the earlycon structure. The UART driver will then calculate a divisor based on BASE_BAUD and set it. With MIPS generic kernels this resulted in garbage output due to the incorrect uart clock rate being used to calculate a divisor. This commit, combined with "serial: 8250_early: Only set divisor if valid clk & baud" prevents the earlycon code setting a bad divisor and restores earlycon output. Fixes: 31cb9a8575ca ("earlycon: initialise baud field of earlycon device structure") Signed-off-by: Matt Redfearn Signed-off-by: Greg Kroah-Hartman --- arch/mips/include/asm/Kbuild | 1 - arch/mips/include/asm/serial.h | 22 ++++++++++++++++++++++ 2 files changed, 22 insertions(+), 1 deletion(-) create mode 100644 arch/mips/include/asm/serial.h diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild index 7c8aab23bce8..b1f66699677d 100644 --- a/arch/mips/include/asm/Kbuild +++ b/arch/mips/include/asm/Kbuild @@ -16,7 +16,6 @@ generic-y += qrwlock.h generic-y += qspinlock.h generic-y += sections.h generic-y += segment.h -generic-y += serial.h generic-y += trace_clock.h generic-y += unaligned.h generic-y += user.h diff --git a/arch/mips/include/asm/serial.h b/arch/mips/include/asm/serial.h new file mode 100644 index 000000000000..1d830c6666c2 --- /dev/null +++ b/arch/mips/include/asm/serial.h @@ -0,0 +1,22 @@ +/* + * Copyright (C) 2017 MIPS Tech, LLC + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ +#ifndef __ASM__SERIAL_H +#define __ASM__SERIAL_H + +#ifdef CONFIG_MIPS_GENERIC +/* + * Generic kernels cannot know a correct value for all platforms at + * compile time. Set it to 0 to prevent 8250_early using it + */ +#define BASE_BAUD 0 +#else +#include +#endif + +#endif /* __ASM__SERIAL_H */ From 7c503475ae2aa08c10e50e56c9e12efe6415eefa Mon Sep 17 00:00:00 2001 From: Boshi Wang Date: Fri, 20 Oct 2017 16:01:03 +0800 Subject: [PATCH 0111/1013] ima: fix hash algorithm initialization [ Upstream commit ebe7c0a7be92bbd34c6ff5b55810546a0ee05bee ] The hash_setup function always sets the hash_setup_done flag, even when the hash algorithm is invalid. This prevents the default hash algorithm defined as CONFIG_IMA_DEFAULT_HASH from being used. This patch sets hash_setup_done flag only for valid hash algorithms. Fixes: e7a2ad7eb6f4 "ima: enable support for larger default filedata hash algorithms" Signed-off-by: Boshi Wang Signed-off-by: Mimi Zohar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- security/integrity/ima/ima_main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c index 2aebb7984437..ab70a395f490 100644 --- a/security/integrity/ima/ima_main.c +++ b/security/integrity/ima/ima_main.c @@ -51,6 +51,8 @@ static int __init hash_setup(char *str) ima_hash_algo = HASH_ALGO_SHA1; else if (strncmp(str, "md5", 3) == 0) ima_hash_algo = HASH_ALGO_MD5; + else + return 1; goto out; } @@ -60,6 +62,8 @@ static int __init hash_setup(char *str) break; } } + if (i == HASH_ALGO__LAST) + return 1; out: hash_setup_done = 1; return 1; From f0bf7b7396a474cf3d61ebd7669bb20f914289fc Mon Sep 17 00:00:00 2001 From: "Jason J. Herne" Date: Tue, 7 Nov 2017 10:22:32 -0500 Subject: [PATCH 0112/1013] s390: vfio-ccw: Do not attempt to free no-op, test and tic cda. [ Upstream commit 408358b50deaf59b07c82a7bff8c7e7cce031fae ] Because we do not make use of the cda (channel data address) for test, no-op ccws no address translation takes place. This means cda could contain a guest address which we do not want to attempt to free. Let's check the command type and skip cda free when it is not needed. For a TIC ccw, ccw->cda points to either a ccw in an existing chain or it points to a whole new allocated chain. In either case the data will be freed when the owning chain is freed. Signed-off-by: Jason J. Herne Reviewed-by: Dong Jia Shi Reviewed-by: Pierre Morel Message-Id: <1510068152-21988-1-git-send-email-jjherne@linux.vnet.ibm.com> Reviewed-by: Halil Pasic Acked-by: Christian Borntraeger Signed-off-by: Cornelia Huck Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/s390/cio/vfio_ccw_cp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/s390/cio/vfio_ccw_cp.c b/drivers/s390/cio/vfio_ccw_cp.c index f20b4d66c75f..4a39b54732d0 100644 --- a/drivers/s390/cio/vfio_ccw_cp.c +++ b/drivers/s390/cio/vfio_ccw_cp.c @@ -330,6 +330,8 @@ static void ccwchain_cda_free(struct ccwchain *chain, int idx) { struct ccw1 *ccw = chain->ch_ccw + idx; + if (ccw_is_test(ccw) || ccw_is_noop(ccw) || ccw_is_tic(ccw)) + return; if (!ccw->count) return; From ca04b90f9d8e053e81f630610076d00239129c1f Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 8 Nov 2017 10:11:02 +0100 Subject: [PATCH 0113/1013] PM / Domains: Fix genpd to deal with drivers returning 1 from ->prepare() [ Upstream commit 5241ab40f6e742f8a1631f8826faf6dc6412b3b5 ] During system-wide PM, genpd relies on its PM callbacks to be invoked for all its attached devices, as to deal with powering off/on the PM domain. In other words, genpd is not compatible with the direct_complete path, if executed by the PM core for any of its attached devices. However, when genpd's ->prepare() callback invokes pm_generic_prepare(), it does not take into account that it may return 1. Instead it treats that as an error internally and expects the PM core to abort the prepare phase and roll back. This leads to genpd not properly powering on/off the PM domain, because its internal counters gets wrongly balanced. To fix the behaviour, allow drivers to return 1 from their ->prepare() callbacks, but let's return 0 from genpd's ->prepare() callback in such case, as that prevents the PM core from running the direct_complete path for the device. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/base/power/domain.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c index e8ca5e2cf1e5..70f8904f46a3 100644 --- a/drivers/base/power/domain.c +++ b/drivers/base/power/domain.c @@ -921,7 +921,7 @@ static int pm_genpd_prepare(struct device *dev) genpd_unlock(genpd); ret = pm_generic_prepare(dev); - if (ret) { + if (ret < 0) { genpd_lock(genpd); genpd->prepared_count--; @@ -929,7 +929,8 @@ static int pm_genpd_prepare(struct device *dev) genpd_unlock(genpd); } - return ret; + /* Never return 1, as genpd don't cope with the direct_complete path. */ + return ret >= 0 ? 0 : ret; } /** From 3dde14f7f78b337d43fc1a6877cb283023121f2f Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Mon, 30 Oct 2017 14:38:58 +0100 Subject: [PATCH 0114/1013] s390/pci: do not require AIS facility [ Upstream commit 48070c73058be6de9c0d754d441ed7092dfc8f12 ] As of today QEMU does not provide the AIS facility to its guest. This prevents Linux guests from using PCI devices as the ais facility is checked during init. As this is just a performance optimization, we can move the ais check into the code where we need it (calling the SIC instruction). This is used at initialization and on interrupt. Both places do not require any serialization, so we can simply skip the instruction. Since we will now get all interrupts, we can also avoid the 2nd scan. As we can have multiple interrupts in parallel we might trigger spurious irqs more often for the non-AIS case but the core code can handle that. Signed-off-by: Christian Borntraeger Reviewed-by: Pierre Morel Reviewed-by: Halil Pasic Acked-by: Sebastian Ott Signed-off-by: Heiko Carstens Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/pci_insn.h | 2 +- arch/s390/pci/pci.c | 5 +++-- arch/s390/pci/pci_insn.c | 6 +++++- 3 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/s390/include/asm/pci_insn.h b/arch/s390/include/asm/pci_insn.h index 419e83fa4721..ba22a6ea51a1 100644 --- a/arch/s390/include/asm/pci_insn.h +++ b/arch/s390/include/asm/pci_insn.h @@ -82,6 +82,6 @@ int zpci_refresh_trans(u64 fn, u64 addr, u64 range); int zpci_load(u64 *data, u64 req, u64 offset); int zpci_store(u64 data, u64 req, u64 offset); int zpci_store_block(const u64 *data, u64 req, u64 offset); -void zpci_set_irq_ctrl(u16 ctl, char *unused, u8 isc); +int zpci_set_irq_ctrl(u16 ctl, char *unused, u8 isc); #endif diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index a25d95a6612d..0fe649c0d542 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -368,7 +368,8 @@ static void zpci_irq_handler(struct airq_struct *airq) /* End of second scan with interrupts on. */ break; /* First scan complete, reenable interrupts. */ - zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, NULL, PCI_ISC); + if (zpci_set_irq_ctrl(SIC_IRQ_MODE_SINGLE, NULL, PCI_ISC)) + break; si = 0; continue; } @@ -956,7 +957,7 @@ static int __init pci_base_init(void) if (!s390_pci_probe) return 0; - if (!test_facility(69) || !test_facility(71) || !test_facility(72)) + if (!test_facility(69) || !test_facility(71)) return 0; rc = zpci_debug_init(); diff --git a/arch/s390/pci/pci_insn.c b/arch/s390/pci/pci_insn.c index ea34086c8674..81b840bc6e4e 100644 --- a/arch/s390/pci/pci_insn.c +++ b/arch/s390/pci/pci_insn.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -91,11 +92,14 @@ int zpci_refresh_trans(u64 fn, u64 addr, u64 range) } /* Set Interruption Controls */ -void zpci_set_irq_ctrl(u16 ctl, char *unused, u8 isc) +int zpci_set_irq_ctrl(u16 ctl, char *unused, u8 isc) { + if (!test_facility(72)) + return -EIO; asm volatile ( " .insn rsy,0xeb00000000d1,%[ctl],%[isc],%[u]\n" : : [ctl] "d" (ctl), [isc] "d" (isc << 27), [u] "Q" (*unused)); + return 0; } /* PCI Load */ From 9aaa793b6b05b57cf37814ccf0d992035ea41407 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Sat, 4 Nov 2017 04:19:52 -0700 Subject: [PATCH 0115/1013] selftests/x86/ldt_get: Add a few additional tests for limits [ Upstream commit fec8f5ae1715a01c72ad52cb2ecd8aacaf142302 ] We weren't testing the .limit and .limit_in_pages fields very well. Add more tests. This addition seems to trigger the "bits 16:19 are undefined" issue that was fixed in an earlier patch. I think that, at least on my CPU, the high nibble of the limit ends in LAR bits 16:19. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/5601c15ea9b3113d288953fd2838b18bedf6bc67.1509794321.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/x86/ldt_gdt.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c index 961e3ee26c27..b1779da72428 100644 --- a/tools/testing/selftests/x86/ldt_gdt.c +++ b/tools/testing/selftests/x86/ldt_gdt.c @@ -367,9 +367,24 @@ static void do_simple_tests(void) install_invalid(&desc, false); desc.seg_not_present = 0; - desc.read_exec_only = 0; desc.seg_32bit = 1; + desc.read_exec_only = 0; + desc.limit = 0xfffff; + install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P | AR_DB); + + desc.limit_in_pages = 1; + + install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA | AR_S | AR_P | AR_DB | AR_G); + desc.read_exec_only = 1; + install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA | AR_S | AR_P | AR_DB | AR_G); + desc.contents = 1; + desc.read_exec_only = 0; + install_valid(&desc, AR_DPL3 | AR_TYPE_RWDATA_EXPDOWN | AR_S | AR_P | AR_DB | AR_G); + desc.read_exec_only = 1; + install_valid(&desc, AR_DPL3 | AR_TYPE_RODATA_EXPDOWN | AR_S | AR_P | AR_DB | AR_G); + + desc.limit = 0; install_invalid(&desc, true); } From 898fe968f78c45e12fed31aee35a118a4e26e1b4 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Sat, 4 Nov 2017 04:19:49 -0700 Subject: [PATCH 0116/1013] selftests/x86/ldt_gdt: Robustify against set_thread_area() and LAR oddities [ Upstream commit d60ad744c9741586010d4bea286f09a063a90fbd ] Bits 19:16 of LAR's result are undefined, and some upcoming improvements to the test case seem to trigger this. Mask off those bits to avoid spurious failures. commit 5b781c7e317f ("x86/tls: Forcibly set the accessed bit in TLS segments") adds a valid case in which LAR's output doesn't quite agree with set_thread_area()'s input. This isn't triggered in the test as is, but it will be if we start calling set_thread_area() with the accessed bit clear. Work around this discrepency. I've added a Fixes tag so that -stable can pick this up if neccesary. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 5b781c7e317f ("x86/tls: Forcibly set the accessed bit in TLS segments") Link: http://lkml.kernel.org/r/b82f3f89c034b53580970ac865139fd8863f44e2.1509794321.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/x86/ldt_gdt.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c index b1779da72428..2afc41a3730f 100644 --- a/tools/testing/selftests/x86/ldt_gdt.c +++ b/tools/testing/selftests/x86/ldt_gdt.c @@ -115,7 +115,15 @@ static void check_valid_segment(uint16_t index, int ldt, return; } - if (ar != expected_ar) { + /* The SDM says "bits 19:16 are undefined". Thanks. */ + ar &= ~0xF0000; + + /* + * NB: Different Linux versions do different things with the + * accessed bit in set_thread_area(). + */ + if (ar != expected_ar && + (ldt || ar != (expected_ar | AR_ACCESSED))) { printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n", (ldt ? "LDT" : "GDT"), index, ar, expected_ar); nerrs++; From ff8cc6f882c735df53ae2e2fdde8362cafe14100 Mon Sep 17 00:00:00 2001 From: Bryan O'Donoghue Date: Mon, 6 Nov 2017 01:32:20 +0000 Subject: [PATCH 0117/1013] staging: greybus: loopback: Fix iteration count on async path [ Upstream commit 44b02da39210e6dd67e39ff1f48d30c56d384240 ] Commit 12927835d211 ("greybus: loopback: Add asynchronous bi-directional support") does what it says on the tin - namely, adds support for asynchronous bi-directional loopback operations. What it neglects to do though is increment the per-connection gb->iteration_count on an asynchronous operation error. This patch fixes that omission. Fixes: 12927835d211 ("greybus: loopback: Add asynchronous bi-directional support") Signed-off-by: Bryan O'Donoghue Reported-by: Mitch Tasman Reviewed-by: Johan Hovold Cc: Alex Elder Cc: Mitch Tasman Cc: greybus-dev@lists.linaro.org Cc: devel@driverdev.osuosl.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/greybus/loopback.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/staging/greybus/loopback.c b/drivers/staging/greybus/loopback.c index 08e255884206..93e86798ec1c 100644 --- a/drivers/staging/greybus/loopback.c +++ b/drivers/staging/greybus/loopback.c @@ -1042,8 +1042,10 @@ static int gb_loopback_fn(void *data) else if (type == GB_LOOPBACK_TYPE_SINK) error = gb_loopback_async_sink(gb, size); - if (error) + if (error) { gb->error++; + gb->iteration_count++; + } } else { /* We are effectively single threaded here */ if (type == GB_LOOPBACK_TYPE_PING) From 162cefc5547b95f24c1a2aad2af1817381afa6a6 Mon Sep 17 00:00:00 2001 From: Greg Ungerer Date: Tue, 5 Sep 2017 22:57:06 +1000 Subject: [PATCH 0118/1013] m68k: fix ColdFire node shift size calculation [ Upstream commit f55ab8f27548ff3431a6567d400c6757c49fd520 ] The m68k pg_data_table is a fix size array defined in arch/m68k/mm/init.c. Index numbers within it are defined based on memory size. But for Coldfire these don't take into account a non-zero physical RAM base address, and this causes us to access past the end of this array at system start time. Change the node shift calculation so that we keep the index inside its range. Reported-by: Angelo Dureghello Tested-by: Angelo Dureghello Signed-off-by: Greg Ungerer Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/m68k/mm/mcfmmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/m68k/mm/mcfmmu.c b/arch/m68k/mm/mcfmmu.c index 8d1408583cf4..b523a604cb87 100644 --- a/arch/m68k/mm/mcfmmu.c +++ b/arch/m68k/mm/mcfmmu.c @@ -170,7 +170,7 @@ void __init cf_bootmem_alloc(void) max_pfn = max_low_pfn = PFN_DOWN(_ramend); high_memory = (void *)_ramend; - m68k_virt_to_node_shift = fls(_ramend - _rambase - 1) - 6; + m68k_virt_to_node_shift = fls(_ramend - 1) - 6; module_fixup(NULL, __start_fixup, __stop_fixup); /* setup bootmem data */ From 4bee9d1d1c54c8bf9ff8149ccacbaf398f524a37 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Sat, 28 Oct 2017 11:35:49 +0200 Subject: [PATCH 0119/1013] serial: 8250_fintek: Fix rs485 disablement on invalid ioctl() [ Upstream commit 3236a965486ba0c6043cf2c7b51943d8b382ae29 ] This driver's ->rs485_config callback checks if SER_RS485_RTS_ON_SEND and SER_RS485_RTS_AFTER_SEND have the same value. If they do, it means the user has passed in invalid data with the TIOCSRS485 ioctl() since RTS must have a different polarity when sending and when not sending. In this case, rs485 mode is not enabled (the RS485_URA bit is not set in the RS485 Enable Register) and this is supposed to be signaled back to the user by clearing the SER_RS485_ENABLED bit in struct serial_rs485 ... except a missing tilde character is preventing that from happening. Fixes: 28e3fb6c4dce ("serial: Add support for Fintek F81216A LPC to 4 UART") Cc: Ricardo Ribalda Delgado Cc: "Ji-Ze Hong (Peter Hong)" Signed-off-by: Lukas Wunner Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_fintek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/serial/8250/8250_fintek.c b/drivers/tty/serial/8250/8250_fintek.c index 4bd376c08b59..ba4af5434b91 100644 --- a/drivers/tty/serial/8250/8250_fintek.c +++ b/drivers/tty/serial/8250/8250_fintek.c @@ -211,7 +211,7 @@ static int fintek_8250_rs485_config(struct uart_port *port, if ((!!(rs485->flags & SER_RS485_RTS_ON_SEND)) == (!!(rs485->flags & SER_RS485_RTS_AFTER_SEND))) - rs485->flags &= SER_RS485_ENABLED; + rs485->flags &= ~SER_RS485_ENABLED; else config |= RS485_URA; From a99611910dd745e940fc1a0cdab15aadf07c3e23 Mon Sep 17 00:00:00 2001 From: Stanislaw Gruszka Date: Mon, 23 Oct 2017 11:35:59 +0200 Subject: [PATCH 0120/1013] staging: rtl8822be: fix wrong dma unmap len [ Upstream commit c40a45a465e9eab72cfdd3ab69d15cf8ef8b89c8 ] Patch fixes splat: r8822be 0000:04:00.0: DMA-API: device driver frees DMA memory with different size [device address=0x0000000078477000] [map size=4096 bytes] [unmap size=424 bytes] Call Trace: debug_dma_unmap_page+0xa5/0xb0 ? unmap_single+0x2f/0x40 _rtl8822be_send_bcn_or_cmd_packet+0x2c5/0x300 [r8822be] ? _rtl8822be_send_bcn_or_cmd_packet+0x2c5/0x300 [r8822be] rtl8822b_halmac_cb_write_data_rsvd_page+0x51/0xc0 [r8822be] _halmac_write_data_rsvd_page+0x22/0x30 [r8822be] halmac_download_rsvd_page_88xx+0xee/0x1f0 [r8822be] halmac_dlfw_to_mem_88xx+0x80/0x120 [r8822be] halmac_download_firmware_88xx.part.47+0x477/0x600 [r8822be] halmac_download_firmware_88xx+0x32/0x40 [r8822be] rtl_halmac_dlfw+0x70/0x120 [r8822be] rtl_halmac_init_hal+0x5f/0x1b0 [r8822be] rtl8822be_hw_init+0x8a2/0x1040 [r8822be] Signed-off-by: Stanislaw Gruszka Acked-by: Larry Finger Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtlwifi/rtl8822be/fw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/rtlwifi/rtl8822be/fw.c b/drivers/staging/rtlwifi/rtl8822be/fw.c index 8e24da16752c..a2cc54866e79 100644 --- a/drivers/staging/rtlwifi/rtl8822be/fw.c +++ b/drivers/staging/rtlwifi/rtl8822be/fw.c @@ -419,7 +419,7 @@ static bool _rtl8822be_send_bcn_or_cmd_packet(struct ieee80211_hw *hw, dma_addr = rtlpriv->cfg->ops->get_desc( hw, (u8 *)pbd_desc, true, HW_DESC_TXBUFF_ADDR); - pci_unmap_single(rtlpci->pdev, dma_addr, skb->len, + pci_unmap_single(rtlpci->pdev, dma_addr, pskb->len, PCI_DMA_TODEVICE); kfree_skb(pskb); From e3912c2aef4ea046568ace227aa5d54bad44cac9 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Fri, 20 Oct 2017 20:40:24 +0200 Subject: [PATCH 0121/1013] staging: rtl8188eu: avoid a null dereference on pmlmepriv [ Upstream commit 123c0aab0050cd0e07ce18e453389fbbb0a5a425 ] There is a check on pmlmepriv before dereferencing it when vfree'ing pmlmepriv->free_bss_buf however the previous call to rtw_free_mlme_priv_ie_data deferences pmlmepriv causing a null pointer deference if it is null. Avoid this by also calling rtw_free_mlme_priv_ie_data if the pointer is non-null. Detected by CoverityScan, CID#1230262 ("Dereference before null check") Fixes: 7b464c9fa5cc ("staging: r8188eu: Add files for new driver - part 4") Signed-off-by: Colin Ian King Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_mlme.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/staging/rtl8188eu/core/rtw_mlme.c b/drivers/staging/rtl8188eu/core/rtw_mlme.c index f663e6c41f8a..f6d71587b803 100644 --- a/drivers/staging/rtl8188eu/core/rtw_mlme.c +++ b/drivers/staging/rtl8188eu/core/rtw_mlme.c @@ -106,10 +106,10 @@ void rtw_free_mlme_priv_ie_data(struct mlme_priv *pmlmepriv) void rtw_free_mlme_priv(struct mlme_priv *pmlmepriv) { - rtw_free_mlme_priv_ie_data(pmlmepriv); - - if (pmlmepriv) + if (pmlmepriv) { + rtw_free_mlme_priv_ie_data(pmlmepriv); vfree(pmlmepriv->free_bss_buf); + } } struct wlan_network *_rtw_alloc_network(struct mlme_priv *pmlmepriv) From d9bf23504e6521af8cbce108761eb22fe9b63136 Mon Sep 17 00:00:00 2001 From: Hiromitsu Yamasaki Date: Thu, 2 Nov 2017 10:32:36 +0100 Subject: [PATCH 0122/1013] spi: sh-msiof: Fix DMA transfer size check [ Upstream commit 36735783fdb599c94b9c86824583df367c65900b ] DMA supports 32-bit words only, even if BITLEN1 of SITMDR2 register is 16bit. Fixes: b0d0ce8b6b91 ("spi: sh-msiof: Add DMA support") Signed-off-by: Hiromitsu Yamasaki Signed-off-by: Simon Horman Acked-by: Geert Uytterhoeven Acked-by: Dirk Behme Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/spi/spi-sh-msiof.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/spi-sh-msiof.c b/drivers/spi/spi-sh-msiof.c index 0eb1e9583485..837bb95eea62 100644 --- a/drivers/spi/spi-sh-msiof.c +++ b/drivers/spi/spi-sh-msiof.c @@ -900,7 +900,7 @@ static int sh_msiof_transfer_one(struct spi_master *master, break; copy32 = copy_bswap32; } else if (bits <= 16) { - if (l & 1) + if (l & 3) break; copy32 = copy_wswap32; } else { From 4507d57777ff432efa66a3d212f35a6036015298 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 30 Oct 2017 11:35:27 +0100 Subject: [PATCH 0123/1013] spi: spi-axi: fix potential use-after-free after deregistration [ Upstream commit 4d5e0689dc9d5640ad46cdfbe1896b74d8df1661 ] Take an extra reference to the controller before deregistering it to prevent use-after-free in the interrupt handler in case an interrupt fires before the line is disabled. Fixes: b1353d1c1d45 ("spi: Add Analog Devices AXI SPI Engine controller support") Acked-by: Lars-Peter Clausen Signed-off-by: Johan Hovold Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/spi/spi-axi-spi-engine.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/spi/spi-axi-spi-engine.c b/drivers/spi/spi-axi-spi-engine.c index 6ab4c7700228..68cfc351b47f 100644 --- a/drivers/spi/spi-axi-spi-engine.c +++ b/drivers/spi/spi-axi-spi-engine.c @@ -553,7 +553,7 @@ static int spi_engine_probe(struct platform_device *pdev) static int spi_engine_remove(struct platform_device *pdev) { - struct spi_master *master = platform_get_drvdata(pdev); + struct spi_master *master = spi_master_get(platform_get_drvdata(pdev)); struct spi_engine *spi_engine = spi_master_get_devdata(master); int irq = platform_get_irq(pdev, 0); @@ -561,6 +561,8 @@ static int spi_engine_remove(struct platform_device *pdev) free_irq(irq, master); + spi_master_put(master); + writel_relaxed(0xff, spi_engine->base + SPI_ENGINE_REG_INT_PENDING); writel_relaxed(0x00, spi_engine->base + SPI_ENGINE_REG_INT_ENABLE); writel_relaxed(0x01, spi_engine->base + SPI_ENGINE_REG_RESET); From 2ef27d564261b2ae9d7c39f9993da8dace2cb99b Mon Sep 17 00:00:00 2001 From: Fabrizio Castro Date: Fri, 22 Sep 2017 12:22:17 +0100 Subject: [PATCH 0124/1013] mmc: tmio: check mmc_regulator_get_supply return value [ Upstream commit a3d95d1d4007b1fefd6d8b12db26fda05de05cfb ] mmc_regulator_get_supply returns -EPROBE_DEFER if either vmmc or vqmmc regulators had their probing deferred. vqmmc regulator is needed by UHS to work properly, therefore this patch checks the value returned by mmc_regulator_get_supply to make sure we have a reference to both vmmc and vqmmc (if found in the DT). Signed-off-by: Fabrizio Castro Reviewed-by: Wolfram Sang Tested-by: Wolfram Sang Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/tmio_mmc_core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/mmc/host/tmio_mmc_core.c b/drivers/mmc/host/tmio_mmc_core.c index 9c4e6199b854..3a6d49f07e22 100644 --- a/drivers/mmc/host/tmio_mmc_core.c +++ b/drivers/mmc/host/tmio_mmc_core.c @@ -1113,8 +1113,11 @@ static int tmio_mmc_init_ocr(struct tmio_mmc_host *host) { struct tmio_mmc_data *pdata = host->pdata; struct mmc_host *mmc = host->mmc; + int err; - mmc_regulator_get_supply(mmc); + err = mmc_regulator_get_supply(mmc); + if (err) + return err; /* use ocr_mask if no regulator */ if (!mmc->ocr_avail) From 69d02547cbdf3d571f0aba394ce4b90b5daf6b71 Mon Sep 17 00:00:00 2001 From: Subhash Jadavani Date: Wed, 27 Sep 2017 11:04:40 +0530 Subject: [PATCH 0125/1013] mmc: sdhci-msm: fix issue with power irq [ Upstream commit c7ccee224d2d551f712752c4a16947f6529d6506 ] SDCC controller reset (SW_RST) during probe may trigger power irq if previous status of PWRCTL was either BUS_ON or IO_HIGH_V. So before we enable the power irq interrupt in GIC (by registering the interrupt handler), we need to ensure that any pending power irq interrupt status is acknowledged otherwise power irq interrupt handler would be fired prematurely. Signed-off-by: Subhash Jadavani Signed-off-by: Vijay Viswanath Acked-by: Adrian Hunter Signed-off-by: Ulf Hansson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/host/sdhci-msm.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c index fc73e56eb1e2..92c483ec6cb2 100644 --- a/drivers/mmc/host/sdhci-msm.c +++ b/drivers/mmc/host/sdhci-msm.c @@ -1251,6 +1251,21 @@ static int sdhci_msm_probe(struct platform_device *pdev) CORE_VENDOR_SPEC_CAPABILITIES0); } + /* + * Power on reset state may trigger power irq if previous status of + * PWRCTL was either BUS_ON or IO_HIGH_V. So before enabling pwr irq + * interrupt in GIC, any pending power irq interrupt should be + * acknowledged. Otherwise power irq interrupt handler would be + * fired prematurely. + */ + sdhci_msm_voltage_switch(host); + + /* + * Ensure that above writes are propogated before interrupt enablement + * in GIC. + */ + mb(); + /* Setup IRQ for handling power/voltage tasks with PMIC */ msm_host->pwr_irq = platform_get_irq_byname(pdev, "pwr_irq"); if (msm_host->pwr_irq < 0) { @@ -1260,6 +1275,9 @@ static int sdhci_msm_probe(struct platform_device *pdev) goto clk_disable; } + /* Enable pwr irq interrupts */ + writel_relaxed(INT_MASK, msm_host->core_mem + CORE_PWRCTL_MASK); + ret = devm_request_threaded_irq(&pdev->dev, msm_host->pwr_irq, NULL, sdhci_msm_pwr_irq, IRQF_ONESHOT, dev_name(&pdev->dev), host); From 9851961159b6d33b7523e73cdedf156204a731bd Mon Sep 17 00:00:00 2001 From: "Edward A. James" Date: Fri, 27 Oct 2017 11:55:05 -0500 Subject: [PATCH 0126/1013] hwmon: (pmbus/core) Prevent unintentional setting of page to 0xFF [ Upstream commit 6dcf2fb5e8db3704f50af1f198256cb4e2453f8b ] The pmbus core may call read/write word data functions with a page value of -1, intending to perform the operation without setting the page. However, the read/write word data functions accept only unsigned 8-bit page numbers, and therefore cannot check for negative page number to avoid setting the page. This results in setting the page number to 0xFF. This may result in errors or undefined behavior of some devices (specifically the ir35221, which allows the page to be set to 0xFF, but some subsequent operations to read registers may fail). Switch the pmbus_set_page page parameter to an integer and perform the check for negative page there. Make read/write functions consistent in accepting an integer page number parameter. Signed-off-by: Edward A. James Fixes: cbcdec6202c9 ("hwmon: (pmbus): Access word data for STATUS_WORD") Signed-off-by: Guenter Roeck Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/hwmon/pmbus/pmbus.h | 6 +++--- drivers/hwmon/pmbus/pmbus_core.c | 25 +++++++++++-------------- 2 files changed, 14 insertions(+), 17 deletions(-) diff --git a/drivers/hwmon/pmbus/pmbus.h b/drivers/hwmon/pmbus/pmbus.h index 4efa2bd4f6d8..fa613bd209e3 100644 --- a/drivers/hwmon/pmbus/pmbus.h +++ b/drivers/hwmon/pmbus/pmbus.h @@ -404,9 +404,9 @@ extern const struct regulator_ops pmbus_regulator_ops; /* Function declarations */ void pmbus_clear_cache(struct i2c_client *client); -int pmbus_set_page(struct i2c_client *client, u8 page); -int pmbus_read_word_data(struct i2c_client *client, u8 page, u8 reg); -int pmbus_write_word_data(struct i2c_client *client, u8 page, u8 reg, u16 word); +int pmbus_set_page(struct i2c_client *client, int page); +int pmbus_read_word_data(struct i2c_client *client, int page, u8 reg); +int pmbus_write_word_data(struct i2c_client *client, int page, u8 reg, u16 word); int pmbus_read_byte_data(struct i2c_client *client, int page, u8 reg); int pmbus_write_byte(struct i2c_client *client, int page, u8 value); int pmbus_write_byte_data(struct i2c_client *client, int page, u8 reg, diff --git a/drivers/hwmon/pmbus/pmbus_core.c b/drivers/hwmon/pmbus/pmbus_core.c index 302f0aef59de..52a58b8b6e1b 100644 --- a/drivers/hwmon/pmbus/pmbus_core.c +++ b/drivers/hwmon/pmbus/pmbus_core.c @@ -136,13 +136,13 @@ void pmbus_clear_cache(struct i2c_client *client) } EXPORT_SYMBOL_GPL(pmbus_clear_cache); -int pmbus_set_page(struct i2c_client *client, u8 page) +int pmbus_set_page(struct i2c_client *client, int page) { struct pmbus_data *data = i2c_get_clientdata(client); int rv = 0; int newpage; - if (page != data->currpage) { + if (page >= 0 && page != data->currpage) { rv = i2c_smbus_write_byte_data(client, PMBUS_PAGE, page); newpage = i2c_smbus_read_byte_data(client, PMBUS_PAGE); if (newpage != page) @@ -158,11 +158,9 @@ int pmbus_write_byte(struct i2c_client *client, int page, u8 value) { int rv; - if (page >= 0) { - rv = pmbus_set_page(client, page); - if (rv < 0) - return rv; - } + rv = pmbus_set_page(client, page); + if (rv < 0) + return rv; return i2c_smbus_write_byte(client, value); } @@ -186,7 +184,8 @@ static int _pmbus_write_byte(struct i2c_client *client, int page, u8 value) return pmbus_write_byte(client, page, value); } -int pmbus_write_word_data(struct i2c_client *client, u8 page, u8 reg, u16 word) +int pmbus_write_word_data(struct i2c_client *client, int page, u8 reg, + u16 word) { int rv; @@ -219,7 +218,7 @@ static int _pmbus_write_word_data(struct i2c_client *client, int page, int reg, return pmbus_write_word_data(client, page, reg, word); } -int pmbus_read_word_data(struct i2c_client *client, u8 page, u8 reg) +int pmbus_read_word_data(struct i2c_client *client, int page, u8 reg) { int rv; @@ -255,11 +254,9 @@ int pmbus_read_byte_data(struct i2c_client *client, int page, u8 reg) { int rv; - if (page >= 0) { - rv = pmbus_set_page(client, page); - if (rv < 0) - return rv; - } + rv = pmbus_set_page(client, page); + if (rv < 0) + return rv; return i2c_smbus_read_byte_data(client, reg); } From 4ee9572b109de66357a73e27effed5266a449183 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Sep 2017 13:38:24 +0200 Subject: [PATCH 0127/1013] perf/core: Fix __perf_read_group_add() locking [ Upstream commit a9cd8194e1e6bd09619954721dfaf0f94fe2003e ] Event timestamps are serialized using ctx->lock, make sure to hold it over reading all values. Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/events/core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index 10cdb9c26b5d..4f1d4bfc607a 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4433,6 +4433,8 @@ static int __perf_read_group_add(struct perf_event *leader, if (ret) return ret; + raw_spin_lock_irqsave(&ctx->lock, flags); + /* * Since we co-schedule groups, {enabled,running} times of siblings * will be identical to those of the leader, so we only publish one @@ -4455,8 +4457,6 @@ static int __perf_read_group_add(struct perf_event *leader, if (read_format & PERF_FORMAT_ID) values[n++] = primary_event_id(leader); - raw_spin_lock_irqsave(&ctx->lock, flags); - list_for_each_entry(sub, &leader->sibling_list, group_entry) { values[n++] += perf_event_count(sub); if (read_format & PERF_FORMAT_ID) From 6cac6a26e3c76c310e33172133e4621b38143244 Mon Sep 17 00:00:00 2001 From: Alexey Khoroshilov Date: Sat, 21 Oct 2017 01:02:07 +0300 Subject: [PATCH 0128/1013] usb: phy: tahvo: fix error handling in tahvo_usb_probe() [ Upstream commit ce035409bfa892a2fabb89720b542e1b335c3426 ] If devm_extcon_dev_allocate() fails, we should disable clk before return. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Fixes: 860d2686fda7 ("usb: phy: tahvo: Use devm_extcon_dev_[allocate|register]() and replace deprecated API") Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/phy/phy-tahvo.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/phy/phy-tahvo.c b/drivers/usb/phy/phy-tahvo.c index 8babd318c0ed..1ec00eae339a 100644 --- a/drivers/usb/phy/phy-tahvo.c +++ b/drivers/usb/phy/phy-tahvo.c @@ -368,7 +368,8 @@ static int tahvo_usb_probe(struct platform_device *pdev) tu->extcon = devm_extcon_dev_allocate(&pdev->dev, tahvo_cable); if (IS_ERR(tu->extcon)) { dev_err(&pdev->dev, "failed to allocate memory for extcon\n"); - return -ENOMEM; + ret = PTR_ERR(tu->extcon); + goto err_disable_clk; } ret = devm_extcon_dev_register(&pdev->dev, tu->extcon); From dd89f47e21aa3166d557f348deae1affde54f8b8 Mon Sep 17 00:00:00 2001 From: Kishon Vijay Abraham I Date: Mon, 9 Oct 2017 14:33:37 +0530 Subject: [PATCH 0129/1013] PCI: dra7xx: Create functional dependency between PCIe and PHY [ Upstream commit 7a4db656a6350f8dd46f711bdef3b0e9c6e3f4cb ] PCI core access configuration space registers in resume_noirq callbacks. In the case of dra7xx, PIPE3 PHY connected to PCIe controller has to be enabled before accessing configuration space registers. Since PIPE3 PHY is enabled by only configuring control module registers, no aborts has been observed so far (though during noirq stage, interface clock of PIPE3 PHY is not enabled). With new TRM updates, PIPE3 PHY has to be initialized (PIPE3 PHY registers has to be accessed) as well which requires the interface clock of PIPE3 PHY to be enabled. The interface clock of PIPE3 PHY is derived from OCP2SCP and hence PCIe PHY is modeled as a child of OCP2SCP. Since pm_runtime is not enabled during noirq stage, pm_runtime_get_sync done in phy_init doesn't enable OCP2SCP clocks resulting in abort when PIPE3 PHY registers are accessed. Create a function dependency between PCIe and PHY here to make sure PCIe is suspended before PCIe PHY/OCP2SCP and resumed after PCIe PHY/OCP2SCP. Suggested-by: Grygorii Strashko Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Sekhar Nori Acked-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/dwc/pci-dra7xx.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/pci/dwc/pci-dra7xx.c b/drivers/pci/dwc/pci-dra7xx.c index 34427a6a15af..362607f727ee 100644 --- a/drivers/pci/dwc/pci-dra7xx.c +++ b/drivers/pci/dwc/pci-dra7xx.c @@ -11,6 +11,7 @@ */ #include +#include #include #include #include @@ -594,6 +595,7 @@ static int __init dra7xx_pcie_probe(struct platform_device *pdev) int i; int phy_count; struct phy **phy; + struct device_link **link; void __iomem *base; struct resource *res; struct dw_pcie *pci; @@ -649,11 +651,21 @@ static int __init dra7xx_pcie_probe(struct platform_device *pdev) if (!phy) return -ENOMEM; + link = devm_kzalloc(dev, sizeof(*link) * phy_count, GFP_KERNEL); + if (!link) + return -ENOMEM; + for (i = 0; i < phy_count; i++) { snprintf(name, sizeof(name), "pcie-phy%d", i); phy[i] = devm_phy_get(dev, name); if (IS_ERR(phy[i])) return PTR_ERR(phy[i]); + + link[i] = device_link_add(dev, &phy[i]->dev, DL_FLAG_STATELESS); + if (!link[i]) { + ret = -EINVAL; + goto err_link; + } } dra7xx->base = base; @@ -732,6 +744,10 @@ static int __init dra7xx_pcie_probe(struct platform_device *pdev) pm_runtime_disable(dev); dra7xx_pcie_disable_phy(dra7xx); +err_link: + while (--i >= 0) + device_link_del(link[i]); + return ret; } From 24567dc3cc8a9fb6f7ac7e9a5040e839efe3716c Mon Sep 17 00:00:00 2001 From: Reinette Chatre Date: Fri, 20 Oct 2017 02:16:57 -0700 Subject: [PATCH 0130/1013] x86/intel_rdt: Initialize bitmask of shareable resource if CDP enabled [ Upstream commit 95953034fb24c16ad0047a98b16427e5935830c4 ] The platform informs via CPUID.(EAX=0x10, ECX=res#):EBX[31:0] (valid res# are only 1 for L3 and 2 for L2) which unit of the allocation may be used by other entities in the platform. This information is valid whether CDP (Code and Data Prioritization) is enabled or not. Ensure that the bitmask of shareable resource is initialized when CDP is enabled. Fixes: 0dd2d7494cd8 ("x86/intel_rdt: Show bitmask of shareable resource with other executing units" Signed-off-by: Reinette Chatre Signed-off-by: Thomas Gleixner Acked-by: Fenghua Yu Acked-by: Vikas Shivappa Acked-by: Tony Luck Link: https://lkml.kernel.org/r/815747bddc820ca221a8924edaf4d1a7324547e4.1508490116.git.reinette.chatre@intel.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/intel_rdt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index cd5fc61ba450..88dcf8479013 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -267,6 +267,7 @@ static void rdt_get_cdp_l3_config(int type) r->num_closid = r_l3->num_closid / 2; r->cache.cbm_len = r_l3->cache.cbm_len; r->default_ctrl = r_l3->default_ctrl; + r->cache.shareable_bits = r_l3->cache.shareable_bits; r->data_width = (r->cache.cbm_len + 3) / 4; r->alloc_capable = true; /* From 6c5faa44569688d8b984059def5846e9f15b2cff Mon Sep 17 00:00:00 2001 From: Reinette Chatre Date: Fri, 20 Oct 2017 02:16:59 -0700 Subject: [PATCH 0131/1013] x86/intel_rdt: Fix potential deadlock during resctrl mount [ Upstream commit 87943db7dfb0c5ee5aa74a9ac06346fadd9695c8 ] Sai reported a warning during some MBA tests: [ 236.755559] ====================================================== [ 236.762443] WARNING: possible circular locking dependency detected [ 236.769328] 4.14.0-rc4-yocto-standard #8 Not tainted [ 236.774857] ------------------------------------------------------ [ 236.781738] mount/10091 is trying to acquire lock: [ 236.787071] (cpu_hotplug_lock.rw_sem){++++}, at: [] static_key_enable+0x12/0x30 [ 236.797058] but task is already holding lock: [ 236.803552] (&type->s_umount_key#37/1){+.+.}, at: [] sget_userns+0x32f/0x520 [ 236.813247] which lock already depends on the new lock. [ 236.822353] the existing dependency chain (in reverse order) is: [ 236.830686] -> #4 (&type->s_umount_key#37/1){+.+.}: [ 236.837756] __lock_acquire+0x1100/0x11a0 [ 236.842799] lock_acquire+0xdf/0x1d0 [ 236.847363] down_write_nested+0x46/0x80 [ 236.852310] sget_userns+0x32f/0x520 [ 236.856873] kernfs_mount_ns+0x7e/0x1f0 [ 236.861728] rdt_mount+0x30c/0x440 [ 236.866096] mount_fs+0x38/0x150 [ 236.870262] vfs_kern_mount+0x67/0x150 [ 236.875015] do_mount+0x1df/0xd50 [ 236.879286] SyS_mount+0x95/0xe0 [ 236.883464] entry_SYSCALL_64_fastpath+0x18/0xad [ 236.889183] -> #3 (rdtgroup_mutex){+.+.}: [ 236.895292] __lock_acquire+0x1100/0x11a0 [ 236.900337] lock_acquire+0xdf/0x1d0 [ 236.904899] __mutex_lock+0x80/0x8f0 [ 236.909459] mutex_lock_nested+0x1b/0x20 [ 236.914407] intel_rdt_online_cpu+0x3b/0x4a0 [ 236.919745] cpuhp_invoke_callback+0xce/0xb80 [ 236.925177] cpuhp_thread_fun+0x1c5/0x230 [ 236.930222] smpboot_thread_fn+0x11a/0x1e0 [ 236.935362] kthread+0x152/0x190 [ 236.939536] ret_from_fork+0x27/0x40 [ 236.944097] -> #2 (cpuhp_state-up){+.+.}: [ 236.950199] __lock_acquire+0x1100/0x11a0 [ 236.955241] lock_acquire+0xdf/0x1d0 [ 236.959800] cpuhp_issue_call+0x12e/0x1c0 [ 236.964845] __cpuhp_setup_state_cpuslocked+0x13b/0x2f0 [ 236.971242] __cpuhp_setup_state+0xa7/0x120 [ 236.976483] page_writeback_init+0x43/0x67 [ 236.981623] pagecache_init+0x38/0x3b [ 236.986281] start_kernel+0x3c6/0x41a [ 236.990931] x86_64_start_reservations+0x2a/0x2c [ 236.996650] x86_64_start_kernel+0x72/0x75 [ 237.001793] verify_cpu+0x0/0xfb [ 237.005966] -> #1 (cpuhp_state_mutex){+.+.}: [ 237.012364] __lock_acquire+0x1100/0x11a0 [ 237.017408] lock_acquire+0xdf/0x1d0 [ 237.021969] __mutex_lock+0x80/0x8f0 [ 237.026527] mutex_lock_nested+0x1b/0x20 [ 237.031475] __cpuhp_setup_state_cpuslocked+0x54/0x2f0 [ 237.037777] __cpuhp_setup_state+0xa7/0x120 [ 237.043013] page_alloc_init+0x28/0x30 [ 237.047769] start_kernel+0x148/0x41a [ 237.052425] x86_64_start_reservations+0x2a/0x2c [ 237.058145] x86_64_start_kernel+0x72/0x75 [ 237.063284] verify_cpu+0x0/0xfb [ 237.067456] -> #0 (cpu_hotplug_lock.rw_sem){++++}: [ 237.074436] check_prev_add+0x401/0x800 [ 237.079286] __lock_acquire+0x1100/0x11a0 [ 237.084330] lock_acquire+0xdf/0x1d0 [ 237.088890] cpus_read_lock+0x42/0x90 [ 237.093548] static_key_enable+0x12/0x30 [ 237.098496] rdt_mount+0x406/0x440 [ 237.102862] mount_fs+0x38/0x150 [ 237.107035] vfs_kern_mount+0x67/0x150 [ 237.111787] do_mount+0x1df/0xd50 [ 237.116058] SyS_mount+0x95/0xe0 [ 237.120233] entry_SYSCALL_64_fastpath+0x18/0xad [ 237.125952] other info that might help us debug this: [ 237.134867] Chain exists of: cpu_hotplug_lock.rw_sem --> rdtgroup_mutex --> &type->s_umount_key#37/1 [ 237.148425] Possible unsafe locking scenario: [ 237.155015] CPU0 CPU1 [ 237.160057] ---- ---- [ 237.165100] lock(&type->s_umount_key#37/1); [ 237.169952] lock(rdtgroup_mutex); [ 237.176641] lock(&type->s_umount_key#37/1); [ 237.184287] lock(cpu_hotplug_lock.rw_sem); [ 237.189041] *** DEADLOCK *** When the resctrl filesystem is mounted the locks must be acquired in the same order as was done when the cpus came online: cpu_hotplug_lock before rdtgroup_mutex. This also requires to switch the static_branch_enable() calls to the _cpulocked variant because now cpu hotplug lock is held already. [ tglx: Switched to cpus_read_[un]lock ] Reported-by: Sai Praneeth Prakhya Signed-off-by: Reinette Chatre Tested-by: Sai Praneeth Prakhya Acked-by: Vikas Shivappa Cc: fenghua.yu@intel.com Cc: tony.luck@intel.com Link: https://lkml.kernel.org/r/9c41b91bc2f47d9e95b62b213ecdb45623c47a9f.1508490116.git.reinette.chatre@intel.com Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index a869d4a073c5..3d433af856a5 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -1081,6 +1081,7 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type, struct dentry *dentry; int ret; + cpus_read_lock(); mutex_lock(&rdtgroup_mutex); /* * resctrl file system can only be mounted once. @@ -1130,12 +1131,12 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type, goto out_mondata; if (rdt_alloc_capable) - static_branch_enable(&rdt_alloc_enable_key); + static_branch_enable_cpuslocked(&rdt_alloc_enable_key); if (rdt_mon_capable) - static_branch_enable(&rdt_mon_enable_key); + static_branch_enable_cpuslocked(&rdt_mon_enable_key); if (rdt_alloc_capable || rdt_mon_capable) - static_branch_enable(&rdt_enable_key); + static_branch_enable_cpuslocked(&rdt_enable_key); if (is_mbm_enabled()) { r = &rdt_resources_all[RDT_RESOURCE_L3]; @@ -1157,6 +1158,7 @@ static struct dentry *rdt_mount(struct file_system_type *fs_type, cdp_disable(); out: mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); return dentry; } From c36100e295ffc4d957b0884a82bd32ac5a135c51 Mon Sep 17 00:00:00 2001 From: Aaron Sierra Date: Wed, 4 Oct 2017 10:01:28 -0500 Subject: [PATCH 0132/1013] serial: 8250: Preserve DLD[7:4] for PORT_XR17V35X [ Upstream commit 0ab84da2e076948c49d36197ee7d254125c53eab ] The upper four bits of the XR17V35x fractional divisor register (DLD) control general chip function (RS-485 direction pin polarity, multidrop mode, XON/XOFF parity check, and fast IR mode). Don't allow these bits to be clobbered when setting the baudrate. Signed-off-by: Aaron Sierra Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/8250/8250_port.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index f0cc04f62b67..8dcfd4978a03 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -2586,8 +2586,11 @@ static void serial8250_set_divisor(struct uart_port *port, unsigned int baud, serial_dl_write(up, quot); /* XR17V35x UARTs have an extra fractional divisor register (DLD) */ - if (up->port.type == PORT_XR17V35X) + if (up->port.type == PORT_XR17V35X) { + /* Preserve bits not related to baudrate; DLD[7:4]. */ + quot_frac |= serial_port_in(port, 0x2) & 0xf0; serial_port_out(port, 0x2, quot_frac); + } } static unsigned int serial8250_get_baud_rate(struct uart_port *port, From 5715de464ac8ef97fdc0daacd41ac35dcd412e2f Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Fri, 20 Oct 2017 08:43:39 +0900 Subject: [PATCH 0133/1013] kprobes: Use synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y [ Upstream commit a30b85df7d599f626973e9cd3056fe755bd778e0 ] We want to wait for all potentially preempted kprobes trampoline execution to have completed. This guarantees that any freed trampoline memory is not in use by any task in the system anymore. synchronize_rcu_tasks() gives such a guarantee, so use it. Also, this guarantees to wait for all potentially preempted tasks on the instructions which will be replaced with a jump. Since this becomes a problem only when CONFIG_PREEMPT=y, enable CONFIG_TASKS_RCU=y for synchronize_rcu_tasks() in that case. Signed-off-by: Masami Hiramatsu Acked-by: Paul E. McKenney Cc: Ananth N Mavinakayanahalli Cc: Linus Torvalds Cc: Naveen N . Rao Cc: Paul E . McKenney Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/150845661962.5443.17724352636247312231.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/Kconfig | 2 +- kernel/kprobes.c | 14 ++++++++------ 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index 057370a0ac4e..400b9e1b2f27 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -91,7 +91,7 @@ config STATIC_KEYS_SELFTEST config OPTPROBES def_bool y depends on KPROBES && HAVE_OPTPROBES - depends on !PREEMPT + select TASKS_RCU if PREEMPT config KPROBES_ON_FTRACE def_bool y diff --git a/kernel/kprobes.c b/kernel/kprobes.c index a1606a4224e1..a66e838640ea 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -573,13 +573,15 @@ static void kprobe_optimizer(struct work_struct *work) do_unoptimize_kprobes(); /* - * Step 2: Wait for quiesence period to ensure all running interrupts - * are done. Because optprobe may modify multiple instructions - * there is a chance that Nth instruction is interrupted. In that - * case, running interrupt can return to 2nd-Nth byte of jump - * instruction. This wait is for avoiding it. + * Step 2: Wait for quiesence period to ensure all potentially + * preempted tasks to have normally scheduled. Because optprobe + * may modify multiple instructions, there is a chance that Nth + * instruction is preempted. In that case, such tasks can return + * to 2nd-Nth byte of jump instruction. This wait is for avoiding it. + * Note that on non-preemptive kernel, this is transparently converted + * to synchronoze_sched() to wait for all interrupts to have completed. */ - synchronize_sched(); + synchronize_rcu_tasks(); /* Step 3: Optimize kprobes after quiesence period */ do_optimize_kprobes(); From ec2d8c832343de707fd777f29d5916b4eb6fea28 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Wed, 18 Oct 2017 10:21:07 -0700 Subject: [PATCH 0134/1013] x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt() [ Upstream commit da20ab35180780e4a6eadc804544f1fa967f3567 ] We do not have tracepoints for sys_modify_ldt() because we define it directly instead of using the normal SYSCALL_DEFINEx() macros. However, there is a reason sys_modify_ldt() does not use the macros: it has an 'int' return type instead of 'unsigned long'. This is a bug, but it's a bug cemented in the ABI. What does this mean? If we return -EINVAL from a function that returns 'int', we have 0x00000000ffffffea in %rax. But, if we return -EINVAL from a function returning 'unsigned long', we end up with 0xffffffffffffffea in %rax, which is wrong. To work around this and maintain the 'int' behavior while using the SYSCALL_DEFINEx() macros, so we add a cast to 'unsigned int' in both implementations of sys_modify_ldt(). Signed-off-by: Dave Hansen Reviewed-by: Andy Lutomirski Reviewed-by: Brian Gerst Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20171018172107.1A79C532@viggo.jf.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/syscalls.h | 2 +- arch/x86/kernel/ldt.c | 16 +++++++++++++--- arch/x86/um/ldt.c | 7 +++++-- 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/syscalls.h b/arch/x86/include/asm/syscalls.h index 91dfcafe27a6..bad25bb80679 100644 --- a/arch/x86/include/asm/syscalls.h +++ b/arch/x86/include/asm/syscalls.h @@ -21,7 +21,7 @@ asmlinkage long sys_ioperm(unsigned long, unsigned long, int); asmlinkage long sys_iopl(unsigned int); /* kernel/ldt.c */ -asmlinkage int sys_modify_ldt(int, void __user *, unsigned long); +asmlinkage long sys_modify_ldt(int, void __user *, unsigned long); /* kernel/signal.c */ asmlinkage long sys_rt_sigreturn(void); diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 4d17bacf4030..ae5615b03def 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -295,8 +296,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) return error; } -asmlinkage int sys_modify_ldt(int func, void __user *ptr, - unsigned long bytecount) +SYSCALL_DEFINE3(modify_ldt, int , func , void __user * , ptr , + unsigned long , bytecount) { int ret = -ENOSYS; @@ -314,5 +315,14 @@ asmlinkage int sys_modify_ldt(int func, void __user *ptr, ret = write_ldt(ptr, bytecount, 0); break; } - return ret; + /* + * The SYSCALL_DEFINE() macros give us an 'unsigned long' + * return type, but tht ABI for sys_modify_ldt() expects + * 'int'. This cast gives us an int-sized value in %rax + * for the return code. The 'unsigned' is necessary so + * the compiler does not try to sign-extend the negative + * return codes into the high half of the register when + * taking the value from int->long. + */ + return (unsigned int)ret; } diff --git a/arch/x86/um/ldt.c b/arch/x86/um/ldt.c index 836a1eb5df43..3ee234b6234d 100644 --- a/arch/x86/um/ldt.c +++ b/arch/x86/um/ldt.c @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -369,7 +370,9 @@ void free_ldt(struct mm_context *mm) mm->arch.ldt.entry_count = 0; } -int sys_modify_ldt(int func, void __user *ptr, unsigned long bytecount) +SYSCALL_DEFINE3(modify_ldt, int , func , void __user * , ptr , + unsigned long , bytecount) { - return do_modify_ldt_skas(func, ptr, bytecount); + /* See non-um modify_ldt() for why we do this cast */ + return (unsigned int)do_modify_ldt_skas(func, ptr, bytecount); } From ebbd9c27dcf71d2947e30d743cfc2c15e7d8f723 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 16 Oct 2017 16:28:38 +0100 Subject: [PATCH 0135/1013] clocksource/drivers/arm_arch_timer: Validate CNTFRQ after enabling frame [ Upstream commit 21492e1333a0d07af6968667f128e19088cf5ead ] The ACPI GTDT code validates the CNTFRQ field of each MMIO timer frame against the CNTFRQ system register of the current CPU, to ensure that they are equal, which is mandated by the architecture. However, reading the CNTFRQ field of a frame is not possible until the RFRQ bit in the frame's CNTACRn register is set, and doing so before that willl produce the following error: arch_timer: [Firmware Bug]: CNTFRQ mismatch: frame @ 0x00000000e0be0000: (0x00000000), CPU: (0x0ee6b280) arch_timer: Disabling MMIO timers due to CNTFRQ mismatch arch_timer: Failed to initialize memory-mapped timer. The reason is that the CNTFRQ field is RES0 if access is not enabled. So move the validation of CNTFRQ into the loop that iterates over the timers to find the best frame, but defer it until after we have selected the best frame, which should also have enabled the RFRQ bit. Signed-off-by: Ard Biesheuvel Signed-off-by: Mark Rutland Signed-off-by: Daniel Lezcano Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clocksource/arm_arch_timer.c | 38 +++++++++++++++------------- 1 file changed, 21 insertions(+), 17 deletions(-) diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index fd4b7f684bd0..14e2419063e9 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -1268,10 +1268,6 @@ arch_timer_mem_find_best_frame(struct arch_timer_mem *timer_mem) iounmap(cntctlbase); - if (!best_frame) - pr_err("Unable to find a suitable frame in timer @ %pa\n", - &timer_mem->cntctlbase); - return best_frame; } @@ -1372,6 +1368,8 @@ static int __init arch_timer_mem_of_init(struct device_node *np) frame = arch_timer_mem_find_best_frame(timer_mem); if (!frame) { + pr_err("Unable to find a suitable frame in timer @ %pa\n", + &timer_mem->cntctlbase); ret = -EINVAL; goto out; } @@ -1420,7 +1418,7 @@ arch_timer_mem_verify_cntfrq(struct arch_timer_mem *timer_mem) static int __init arch_timer_mem_acpi_init(int platform_timer_count) { struct arch_timer_mem *timers, *timer; - struct arch_timer_mem_frame *frame; + struct arch_timer_mem_frame *frame, *best_frame = NULL; int timer_count, i, ret = 0; timers = kcalloc(platform_timer_count, sizeof(*timers), @@ -1432,14 +1430,6 @@ static int __init arch_timer_mem_acpi_init(int platform_timer_count) if (ret || !timer_count) goto out; - for (i = 0; i < timer_count; i++) { - ret = arch_timer_mem_verify_cntfrq(&timers[i]); - if (ret) { - pr_err("Disabling MMIO timers due to CNTFRQ mismatch\n"); - goto out; - } - } - /* * While unlikely, it's theoretically possible that none of the frames * in a timer expose the combination of feature we want. @@ -1448,12 +1438,26 @@ static int __init arch_timer_mem_acpi_init(int platform_timer_count) timer = &timers[i]; frame = arch_timer_mem_find_best_frame(timer); - if (frame) - break; + if (!best_frame) + best_frame = frame; + + ret = arch_timer_mem_verify_cntfrq(timer); + if (ret) { + pr_err("Disabling MMIO timers due to CNTFRQ mismatch\n"); + goto out; + } + + if (!best_frame) /* implies !frame */ + /* + * Only complain about missing suitable frames if we + * haven't already found one in a previous iteration. + */ + pr_err("Unable to find a suitable frame in timer @ %pa\n", + &timer->cntctlbase); } - if (frame) - ret = arch_timer_mem_frame_register(frame); + if (best_frame) + ret = arch_timer_mem_frame_register(best_frame); out: kfree(timers); return ret; From a105cd032d9d42a61aad547b4af283ead1c0d37b Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Mon, 18 Sep 2017 15:46:41 +0200 Subject: [PATCH 0136/1013] dt-bindings: timer: renesas, cmt: Fix SoC-specific compatible values [ Upstream commit e20824e944c3bf4352fcd8d9f446c41b53901e7b ] While the new family-specific compatible values introduced by commit 6f54cc1adcc8957f ("devicetree: bindings: R-Car Gen2 CMT0 and CMT1 bindings") use the recommended order ",-", the new SoC-specific compatible values still use the old and deprecated order ",-". Switch the SoC-specific compatible values to the recommended order while there are no upstream users of these compatible values yet. Fixes: 7f03a0ecfdc786c1 ("devicetree: bindings: r8a73a4 and R-Car Gen2 CMT bindings") Fixes: 63d9e8ca0dd4bfa4 ("devicetree: bindings: Deprecate property, update example") Signed-off-by: Geert Uytterhoeven Acked-by: Rob Herring Reviewed-by: Laurent Pinchart Signed-off-by: Daniel Lezcano Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- .../devicetree/bindings/timer/renesas,cmt.txt | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/Documentation/devicetree/bindings/timer/renesas,cmt.txt b/Documentation/devicetree/bindings/timer/renesas,cmt.txt index 6ca6b9e582a0..d740989eb569 100644 --- a/Documentation/devicetree/bindings/timer/renesas,cmt.txt +++ b/Documentation/devicetree/bindings/timer/renesas,cmt.txt @@ -20,16 +20,16 @@ Required Properties: (CMT1 on sh73a0 and r8a7740) This is a fallback for the above renesas,cmt-48-* entries. - - "renesas,cmt0-r8a73a4" for the 32-bit CMT0 device included in r8a73a4. - - "renesas,cmt1-r8a73a4" for the 48-bit CMT1 device included in r8a73a4. - - "renesas,cmt0-r8a7790" for the 32-bit CMT0 device included in r8a7790. - - "renesas,cmt1-r8a7790" for the 48-bit CMT1 device included in r8a7790. - - "renesas,cmt0-r8a7791" for the 32-bit CMT0 device included in r8a7791. - - "renesas,cmt1-r8a7791" for the 48-bit CMT1 device included in r8a7791. - - "renesas,cmt0-r8a7793" for the 32-bit CMT0 device included in r8a7793. - - "renesas,cmt1-r8a7793" for the 48-bit CMT1 device included in r8a7793. - - "renesas,cmt0-r8a7794" for the 32-bit CMT0 device included in r8a7794. - - "renesas,cmt1-r8a7794" for the 48-bit CMT1 device included in r8a7794. + - "renesas,r8a73a4-cmt0" for the 32-bit CMT0 device included in r8a73a4. + - "renesas,r8a73a4-cmt1" for the 48-bit CMT1 device included in r8a73a4. + - "renesas,r8a7790-cmt0" for the 32-bit CMT0 device included in r8a7790. + - "renesas,r8a7790-cmt1" for the 48-bit CMT1 device included in r8a7790. + - "renesas,r8a7791-cmt0" for the 32-bit CMT0 device included in r8a7791. + - "renesas,r8a7791-cmt1" for the 48-bit CMT1 device included in r8a7791. + - "renesas,r8a7793-cmt0" for the 32-bit CMT0 device included in r8a7793. + - "renesas,r8a7793-cmt1" for the 48-bit CMT1 device included in r8a7793. + - "renesas,r8a7794-cmt0" for the 32-bit CMT0 device included in r8a7794. + - "renesas,r8a7794-cmt1" for the 48-bit CMT1 device included in r8a7794. - "renesas,rcar-gen2-cmt0" for 32-bit CMT0 devices included in R-Car Gen2. - "renesas,rcar-gen2-cmt1" for 48-bit CMT1 devices included in R-Car Gen2. @@ -46,7 +46,7 @@ Required Properties: Example: R8A7790 (R-Car H2) CMT0 and CMT1 nodes cmt0: timer@ffca0000 { - compatible = "renesas,cmt0-r8a7790", "renesas,rcar-gen2-cmt0"; + compatible = "renesas,r8a7790-cmt0", "renesas,rcar-gen2-cmt0"; reg = <0 0xffca0000 0 0x1004>; interrupts = <0 142 IRQ_TYPE_LEVEL_HIGH>, <0 142 IRQ_TYPE_LEVEL_HIGH>; @@ -55,7 +55,7 @@ Example: R8A7790 (R-Car H2) CMT0 and CMT1 nodes }; cmt1: timer@e6130000 { - compatible = "renesas,cmt1-r8a7790", "renesas,rcar-gen2-cmt1"; + compatible = "renesas,r8a7790-cmt1", "renesas,rcar-gen2-cmt1"; reg = <0 0xe6130000 0 0x1004>; interrupts = <0 120 IRQ_TYPE_LEVEL_HIGH>, <0 121 IRQ_TYPE_LEVEL_HIGH>, From a95d0d2ee717908be9b67d1f1b2fb44e1c1210a1 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 16 Oct 2017 12:40:29 -0500 Subject: [PATCH 0137/1013] EDAC, sb_edac: Fix missing break in switch [ Upstream commit a8e9b186f153a44690ad0363a56716e7077ad28c ] Add missing break statement in order to prevent the code from falling through. Signed-off-by: Gustavo A. R. Silva Cc: Qiuxu Zhuo Cc: linux-edac Link: http://lkml.kernel.org/r/20171016174029.GA19757@embeddedor.com Signed-off-by: Borislav Petkov Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/edac/sb_edac.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/edac/sb_edac.c b/drivers/edac/sb_edac.c index 6241fa787d66..cd9d6ba03579 100644 --- a/drivers/edac/sb_edac.c +++ b/drivers/edac/sb_edac.c @@ -2498,6 +2498,7 @@ static int ibridge_mci_bind_devs(struct mem_ctl_info *mci, case PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_TA: case PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_TA: pvt->pci_ta = pdev; + break; case PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA0_RAS: case PCI_DEVICE_ID_INTEL_IBRIDGE_IMC_HA1_RAS: pvt->pci_ras = pdev; From 8891ee7e676c0adc59b5e6c298de0b109a1934d5 Mon Sep 17 00:00:00 2001 From: Chunfeng Yun Date: Fri, 13 Oct 2017 17:10:37 +0800 Subject: [PATCH 0138/1013] usb: mtu3: fix error return code in ssusb_gadget_init() [ Upstream commit c162ff0aaaac456ef29aebd1e9d4d3e305cd3279 ] When failing to get IRQ number, platform_get_irq() may return -EPROBE_DEFER, but we ignore it and always return -ENODEV, so fix it. Signed-off-by: Chunfeng Yun Signed-off-by: Felipe Balbi Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/mtu3/mtu3_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/mtu3/mtu3_core.c b/drivers/usb/mtu3/mtu3_core.c index 99c65b0788ff..947579842ad7 100644 --- a/drivers/usb/mtu3/mtu3_core.c +++ b/drivers/usb/mtu3/mtu3_core.c @@ -774,9 +774,9 @@ int ssusb_gadget_init(struct ssusb_mtk *ssusb) return -ENOMEM; mtu->irq = platform_get_irq(pdev, 0); - if (mtu->irq <= 0) { + if (mtu->irq < 0) { dev_err(dev, "fail to get irq number\n"); - return -ENODEV; + return mtu->irq; } dev_info(dev, "irq %d\n", mtu->irq); From 11d9a37aaf271de4dc9b8016182330ba8f06bf38 Mon Sep 17 00:00:00 2001 From: Ioana Radulescu Date: Wed, 11 Oct 2017 08:29:44 -0500 Subject: [PATCH 0139/1013] staging: fsl-dpaa2/eth: Account for Rx FD buffers on error path [ Upstream commit cbb3ea40fc495bf04070200b35c1c4cd05d11bd3 ] On Rx path, if we fail to build an skb from the incoming FD, we still need to update the channel buffer count accordingly, otherwise we risk depleting the pool while the software counter still sees available buffers. Signed-off-by: Ioana Radulescu Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c index 26017fe9df93..8e84b2e7f5bd 100644 --- a/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c +++ b/drivers/staging/fsl-dpaa2/ethernet/dpaa2-eth.c @@ -131,6 +131,8 @@ static struct sk_buff *build_linear_skb(struct dpaa2_eth_priv *priv, u16 fd_offset = dpaa2_fd_get_offset(fd); u32 fd_length = dpaa2_fd_get_len(fd); + ch->buf_count--; + skb = build_skb(fd_vaddr, DPAA2_ETH_RX_BUF_SIZE + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); if (unlikely(!skb)) @@ -139,8 +141,6 @@ static struct sk_buff *build_linear_skb(struct dpaa2_eth_priv *priv, skb_reserve(skb, fd_offset); skb_put(skb, fd_length); - ch->buf_count--; - return skb; } @@ -178,8 +178,15 @@ static struct sk_buff *build_frag_skb(struct dpaa2_eth_priv *priv, /* We build the skb around the first data buffer */ skb = build_skb(sg_vaddr, DPAA2_ETH_RX_BUF_SIZE + SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); - if (unlikely(!skb)) - return NULL; + if (unlikely(!skb)) { + /* We still need to subtract the buffers used + * by this FD from our software counter + */ + while (!dpaa2_sg_is_final(&sgt[i]) && + i < DPAA2_ETH_MAX_SG_ENTRIES) + i++; + break; + } sg_offset = dpaa2_sg_get_offset(sge); skb_reserve(skb, sg_offset); From 2001a980d1ab676f07a455c728fa72ae86d40058 Mon Sep 17 00:00:00 2001 From: Larry Finger Date: Sat, 23 Sep 2017 19:36:04 -0500 Subject: [PATCH 0140/1013] staging: rtl8822be: Keep array subscript no lower than zero [ Upstream commit 43d15c2013130a9fa230c2f5203aca818ae0bb86 ] The kbuild test robot reports the following: drivers/staging//rtlwifi/phydm/phydm_dig.c: In function 'odm_pause_dig': drivers/staging//rtlwifi/phydm/phydm_dig.c:494:45: warning: array subscript is below array bounds [-Warray-bounds] odm_write_dig(dm, dig_tab->pause_dig_value[max_level]); This condition is caused when a loop falls through. The fix is to pin max_level to be >= 0. Signed-off-by: Larry Finger c: kbuild test robot Fixes: 9ce99b04b5b82fdf11e4c76b60a5f82c1e541297 staging: r8822be: Add phydm mini driver Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtlwifi/phydm/phydm_dig.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/staging/rtlwifi/phydm/phydm_dig.c b/drivers/staging/rtlwifi/phydm/phydm_dig.c index 31a4f3fcad19..c88b9788363a 100644 --- a/drivers/staging/rtlwifi/phydm/phydm_dig.c +++ b/drivers/staging/rtlwifi/phydm/phydm_dig.c @@ -490,6 +490,8 @@ void odm_pause_dig(void *dm_void, enum phydm_pause_type pause_type, break; } + /* pin max_level to be >= 0 */ + max_level = max_t(s8, 0, max_level); /* write IGI of lower level */ odm_write_dig(dm, dig_tab->pause_dig_value[max_level]); ODM_RT_TRACE(dm, ODM_COMP_DIG, From 8876436aa90d03372a2e6480b9a26d19d225603c Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Tue, 10 Oct 2017 13:47:55 +0800 Subject: [PATCH 0141/1013] ARM: cpuidle: Correct driver unregistration if init fails [ Upstream commit 0f87855d969a87f02048ff5ced7503465d5ab2f1 ] If cpuidle init fails, the code misses to unregister the driver for current CPU. Furthermore, we also need to rollback to cancel all previous CPUs registration; but the code retrieves driver handler by using function cpuidle_get_driver(), this function returns back current CPU driver handler but not previous CPU's handler, which leads to the failure handling code cannot unregister previous CPUs driver. This commit fixes two mentioned issues, it adds error handling path 'goto out_unregister_drv' for current CPU driver unregistration; and it is to replace cpuidle_get_driver() with cpuidle_get_cpu_driver(), the later function can retrieve driver handler for previous CPUs according to the CPU device handler so can unregister the driver properly. This patch also adds extra error handling paths 'goto out_kfree_dev' and 'goto out_kfree_drv' and adjusts the freeing sentences for previous CPUs; so make the code more readable for freeing 'dev' and 'drv' structures. Suggested-by: Daniel Lezcano Signed-off-by: Leo Yan Fixes: d50a7d8acd78 (ARM: cpuidle: Support asymmetric idle definition) Acked-by: Daniel Lezcano Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/cpuidle/cpuidle-arm.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-) diff --git a/drivers/cpuidle/cpuidle-arm.c b/drivers/cpuidle/cpuidle-arm.c index 52a75053ee03..f47c54546752 100644 --- a/drivers/cpuidle/cpuidle-arm.c +++ b/drivers/cpuidle/cpuidle-arm.c @@ -104,13 +104,13 @@ static int __init arm_idle_init(void) ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); if (ret <= 0) { ret = ret ? : -ENODEV; - goto init_fail; + goto out_kfree_drv; } ret = cpuidle_register_driver(drv); if (ret) { pr_err("Failed to register cpuidle driver\n"); - goto init_fail; + goto out_kfree_drv; } /* @@ -128,14 +128,14 @@ static int __init arm_idle_init(void) if (ret) { pr_err("CPU %d failed to init idle CPU ops\n", cpu); - goto out_fail; + goto out_unregister_drv; } dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) { pr_err("Failed to allocate cpuidle device\n"); ret = -ENOMEM; - goto out_fail; + goto out_unregister_drv; } dev->cpu = cpu; @@ -143,21 +143,25 @@ static int __init arm_idle_init(void) if (ret) { pr_err("Failed to register cpuidle device for CPU %d\n", cpu); - kfree(dev); - goto out_fail; + goto out_kfree_dev; } } return 0; -init_fail: + +out_kfree_dev: + kfree(dev); +out_unregister_drv: + cpuidle_unregister_driver(drv); +out_kfree_drv: kfree(drv); out_fail: while (--cpu >= 0) { dev = per_cpu(cpuidle_devices, cpu); + drv = cpuidle_get_cpu_driver(dev); cpuidle_unregister_device(dev); - kfree(dev); - drv = cpuidle_get_driver(); cpuidle_unregister_driver(drv); + kfree(dev); kfree(drv); } From 9e99614575b9f1e66249729878a415b9b6659ee7 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Thu, 5 Oct 2017 11:21:43 +0300 Subject: [PATCH 0142/1013] usb: xhci: Return error when host is dead in xhci_disable_slot() [ Upstream commit dcabc76fa9361186e6b88c30a68db8fa9d5b4a1c ] xhci_disable_slot() is a helper for disabling a slot when a device goes away or recovers from error situations. Currently, it returns success when it sees a dead host. This is not the right way to go. It should return error and let the invoker know that disable slot command was failed due to a dead host. Fixes: f9e609b82479 ("usb: xhci: Add helper function xhci_disable_slot().") Cc: Guoqing Zhang Signed-off-by: Lu Baolu Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c index 51535ba2bcd4..e5677700dea4 100644 --- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -3583,10 +3583,9 @@ int xhci_disable_slot(struct xhci_hcd *xhci, struct xhci_command *command, state = readl(&xhci->op_regs->status); if (state == 0xffffffff || (xhci->xhc_state & XHCI_STATE_DYING) || (xhci->xhc_state & XHCI_STATE_HALTED)) { - xhci_free_virt_device(xhci, slot_id); spin_unlock_irqrestore(&xhci->lock, flags); kfree(command); - return ret; + return -ENODEV; } ret = xhci_queue_slot_control(xhci, command, TRB_DISABLE_SLOT, From dddd2bd98cb2a547c9fd08d68273c69d5bca3e1f Mon Sep 17 00:00:00 2001 From: Jibin Xu Date: Sun, 10 Sep 2017 20:11:42 -0700 Subject: [PATCH 0143/1013] sysrq : fix Show Regs call trace on ARM [ Upstream commit b00bebbc301c8e1f74f230dc82282e56b7e7a6db ] When kernel configuration SMP,PREEMPT and DEBUG_PREEMPT are enabled, echo 1 >/proc/sys/kernel/sysrq echo p >/proc/sysrq-trigger kernel will print call trace as below: sysrq: SysRq : Show Regs BUG: using __this_cpu_read() in preemptible [00000000] code: sh/435 caller is __this_cpu_preempt_check+0x18/0x20 Call trace: [] dump_backtrace+0x0/0x1d0 [] show_stack+0x24/0x30 [] dump_stack+0x90/0xb0 [] check_preemption_disabled+0x100/0x108 [] __this_cpu_preempt_check+0x18/0x20 [] sysrq_handle_showregs+0x1c/0x40 [] __handle_sysrq+0x12c/0x1a0 [] write_sysrq_trigger+0x60/0x70 [] proc_reg_write+0x90/0xd0 [] __vfs_write+0x48/0x90 [] vfs_write+0xa4/0x190 [] SyS_write+0x54/0xb0 [] el0_svc_naked+0x24/0x28 This can be seen on a common board like an r-pi3. This happens because when echo p >/proc/sysrq-trigger, get_irq_regs() is called outside of IRQ context, if preemption is enabled in this situation,kernel will print the call trace. Since many prior discussions on the mailing lists have made it clear that get_irq_regs either just returns NULL or stale data when used outside of IRQ context,we simply avoid calling it outside of IRQ context. Signed-off-by: Jibin Xu Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/tty/sysrq.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/tty/sysrq.c b/drivers/tty/sysrq.c index d008f5a75197..377b3592384e 100644 --- a/drivers/tty/sysrq.c +++ b/drivers/tty/sysrq.c @@ -246,8 +246,10 @@ static void sysrq_handle_showallcpus(int key) * architecture has no support for it: */ if (!trigger_all_cpu_backtrace()) { - struct pt_regs *regs = get_irq_regs(); + struct pt_regs *regs = NULL; + if (in_irq()) + regs = get_irq_regs(); if (regs) { pr_info("CPU%d:\n", smp_processor_id()); show_regs(regs); @@ -266,7 +268,10 @@ static struct sysrq_key_op sysrq_showallcpus_op = { static void sysrq_handle_showregs(int key) { - struct pt_regs *regs = get_irq_regs(); + struct pt_regs *regs = NULL; + + if (in_irq()) + regs = get_irq_regs(); if (regs) show_regs(regs); perf_event_print_debug(); From 25aafb3061f64099e379dfb24e394b07f3f93b38 Mon Sep 17 00:00:00 2001 From: Andy Lowe Date: Fri, 22 Sep 2017 20:29:30 +0200 Subject: [PATCH 0144/1013] serial: sh-sci: suppress warning for ports without dma channels [ Upstream commit 7464779fa8551b90d5797d4020b0bdb7e6422eb9 ] If a port has no dma channel defined in the device tree, then don't attempt to allocate a dma channel for the port. Also suppress the warning message concerning the failure to allocate a dma channel. Continue to emit the warning message if a dma channel is defined but cannot be allocated. Signed-off-by: Andy Lowe Signed-off-by: Eugeniu Rosca Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/sh-sci.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c index 784dd42002ea..761b9f5f1491 100644 --- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -1491,6 +1491,14 @@ static void sci_request_dma(struct uart_port *port) return; s->cookie_tx = -EINVAL; + + /* + * Don't request a dma channel if no channel was specified + * in the device tree. + */ + if (!of_find_property(port->dev->of_node, "dmas", NULL)) + return; + chan = sci_request_dma_chan(port, DMA_MEM_TO_DEV); dev_dbg(port->dev, "%s: TX: got channel %p\n", __func__, chan); if (chan) { From 87837b102c0630fd559a412f6fa48c8e9710755c Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Sun, 1 Oct 2017 02:18:37 +0100 Subject: [PATCH 0145/1013] usbip: tools: Install all headers needed for libusbip development [ Upstream commit c15562c0dcb2c7f26e891923b784cf1926b8c833 ] usbip_host_driver.h now depends on several additional headers, which need to be installed along with it. Fixes: 021aed845303 ("staging: usbip: userspace: migrate usbip_host_driver ...") Fixes: 3391ba0e2792 ("usbip: tools: Extract generic code to be shared with ...") Signed-off-by: Ben Hutchings Acked-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/usb/usbip/Makefile.am | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tools/usb/usbip/Makefile.am b/tools/usb/usbip/Makefile.am index da3a430849a8..5961e9c18812 100644 --- a/tools/usb/usbip/Makefile.am +++ b/tools/usb/usbip/Makefile.am @@ -2,6 +2,7 @@ SUBDIRS := libsrc src includedir = @includedir@/usbip include_HEADERS := $(addprefix libsrc/, \ - usbip_common.h vhci_driver.h usbip_host_driver.h) + usbip_common.h vhci_driver.h usbip_host_driver.h \ + list.h sysfs_utils.h usbip_host_common.h) dist_man_MANS := $(addprefix doc/, usbip.8 usbipd.8) From c25bec2de9268122407a3f431dd7d031c06f293d Mon Sep 17 00:00:00 2001 From: Ian Jamison Date: Thu, 21 Sep 2017 10:13:12 +0200 Subject: [PATCH 0146/1013] serial: imx: Update cached mctrl value when changing RTS MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit a0983c742a5885f82afb282166f83f1d3d8addf4 ] UART core function uart_update_mctrl relies on a cached value of modem control lines. This was used but not updated by local RTS control functions within imx.c. These are used for RS485 line driver enable signalling. Having an out-of-date value in the cached mctrl can result in the transmitter being enabled when it shouldn't be. Fix this by updating the mctrl value before applying it. Signed-off-by: Ian Jamison Origin: id:8195c96e674517b82a6ff7fe914c7ba0f86e702b.1505375165.git.ian.dev@arkver.com Acked-by: Uwe Kleine-König Tested-by: Uwe Kleine-König Tested-by: Clemens Gruber Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serial/imx.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index dfeff3951f93..3657d745e90f 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -334,7 +334,8 @@ static void imx_port_rts_active(struct imx_port *sport, unsigned long *ucr2) { *ucr2 &= ~(UCR2_CTSC | UCR2_CTS); - mctrl_gpio_set(sport->gpios, sport->port.mctrl | TIOCM_RTS); + sport->port.mctrl |= TIOCM_RTS; + mctrl_gpio_set(sport->gpios, sport->port.mctrl); } static void imx_port_rts_inactive(struct imx_port *sport, unsigned long *ucr2) @@ -342,7 +343,8 @@ static void imx_port_rts_inactive(struct imx_port *sport, unsigned long *ucr2) *ucr2 &= ~UCR2_CTSC; *ucr2 |= UCR2_CTS; - mctrl_gpio_set(sport->gpios, sport->port.mctrl & ~TIOCM_RTS); + sport->port.mctrl &= ~TIOCM_RTS; + mctrl_gpio_set(sport->gpios, sport->port.mctrl); } static void imx_port_rts_auto(struct imx_port *sport, unsigned long *ucr2) From 9093d8bc530402a7d7651be40dd53b0b97a813af Mon Sep 17 00:00:00 2001 From: Ioana Radulescu Date: Thu, 28 Sep 2017 09:10:33 -0500 Subject: [PATCH 0147/1013] staging: fsl-mc/dpio: Fix incorrect comparison [ Upstream commit 8dabf52ffb6445fa5bcc8b6d2ecb615f60d0dd12 ] For some dpio functions, a cpu id parameter value of -1 is valid and means "any". But when trying to validate this param value against an upper limit, in this case num_possible_cpus(), we risk obtaining the wrong result due to an implicit cast. Avoid an incorrect check result by explicitly comparing the cpu id with the "any" value before verifying the upper bound. Signed-off-by: Ioana Radulescu Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/fsl-mc/bus/dpio/dpio-service.c | 4 ++-- drivers/staging/fsl-mc/include/dpaa2-io.h | 6 ++++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c index f8096828f5b7..a609ec82daf3 100644 --- a/drivers/staging/fsl-mc/bus/dpio/dpio-service.c +++ b/drivers/staging/fsl-mc/bus/dpio/dpio-service.c @@ -76,7 +76,7 @@ static inline struct dpaa2_io *service_select_by_cpu(struct dpaa2_io *d, if (d) return d; - if (unlikely(cpu >= num_possible_cpus())) + if (cpu != DPAA2_IO_ANY_CPU && cpu >= num_possible_cpus()) return NULL; /* @@ -121,7 +121,7 @@ struct dpaa2_io *dpaa2_io_create(const struct dpaa2_io_desc *desc) return NULL; /* check if CPU is out of range (-1 means any cpu) */ - if (desc->cpu >= num_possible_cpus()) { + if (desc->cpu != DPAA2_IO_ANY_CPU && desc->cpu >= num_possible_cpus()) { kfree(obj); return NULL; } diff --git a/drivers/staging/fsl-mc/include/dpaa2-io.h b/drivers/staging/fsl-mc/include/dpaa2-io.h index c5646096c5d4..afc2d060d077 100644 --- a/drivers/staging/fsl-mc/include/dpaa2-io.h +++ b/drivers/staging/fsl-mc/include/dpaa2-io.h @@ -54,6 +54,8 @@ struct device; * for dequeue. */ +#define DPAA2_IO_ANY_CPU -1 + /** * struct dpaa2_io_desc - The DPIO descriptor * @receives_notifications: Use notificaton mode. Non-zero if the DPIO @@ -91,8 +93,8 @@ irqreturn_t dpaa2_io_irq(struct dpaa2_io *obj); * @cb: The callback to be invoked when the notification arrives * @is_cdan: Zero for FQDAN, non-zero for CDAN * @id: FQID or channel ID, needed for rearm - * @desired_cpu: The cpu on which the notifications will show up. -1 means - * any CPU. + * @desired_cpu: The cpu on which the notifications will show up. Use + * DPAA2_IO_ANY_CPU if don't care * @dpio_id: The dpio index * @qman64: The 64-bit context value shows up in the FQDAN/CDAN. * @node: The list node From 6bb9d1772d4a07c1c71e4b82035a67c11f21ad20 Mon Sep 17 00:00:00 2001 From: Thomas Richter Date: Wed, 13 Sep 2017 10:12:09 +0200 Subject: [PATCH 0148/1013] perf test attr: Fix ignored test case result [ Upstream commit 22905582f6dd4bbd0c370fe5732c607452010c04 ] Command perf test -v 16 (Setup struct perf_event_attr test) always reports success even if the test case fails. It works correctly if you also specify -F (for don't fork). root@s35lp76 perf]# ./perf test -v 16 15: Setup struct perf_event_attr : --- start --- running './tests/attr/test-record-no-delay' [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.002 MB /tmp/tmp4E1h7R/perf.data (1 samples) ] expected task=0, got 1 expected precise_ip=0, got 3 expected wakeup_events=1, got 0 FAILED './tests/attr/test-record-no-delay' - match failure test child finished with 0 ---- end ---- Setup struct perf_event_attr: Ok The reason for the wrong error reporting is the return value of the system() library call. It is called in run_dir() file tests/attr.c and returns the exit status, in above case 0xff00. This value is given as parameter to the exit() function which can only handle values 0-0xff. The child process terminates with exit value of 0 and the parent does not detect any error. This patch corrects the error reporting and prints the correct test result. Signed-off-by: Thomas-Mich Richter Acked-by: Jiri Olsa Cc: Heiko Carstens Cc: Hendrik Brueckner Cc: Martin Schwidefsky Cc: Thomas-Mich Richter LPU-Reference: 20170913081209.39570-2-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-rdube6rfcjsr1nzue72c7lqn@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/perf/tests/attr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c index c180bbcdbef6..0e1367f90af5 100644 --- a/tools/perf/tests/attr.c +++ b/tools/perf/tests/attr.c @@ -167,7 +167,7 @@ static int run_dir(const char *d, const char *perf) snprintf(cmd, 3*PATH_MAX, PYTHON " %s/attr.py -d %s/attr/ -p %s %.*s", d, d, perf, vcnt, v); - return system(cmd); + return system(cmd) ? TEST_FAIL : TEST_OK; } int test__attr(struct test *test __maybe_unused, int subtest __maybe_unused) From c530d86ded9becc305aaf984cab5d547b6fd6480 Mon Sep 17 00:00:00 2001 From: Thomas Richter Date: Wed, 13 Sep 2017 10:12:08 +0200 Subject: [PATCH 0149/1013] perf test attr: Fix python error on empty result [ Upstream commit 3440fe2790aa3d13530260af6033533b18959aee ] Commit d78ada4a767 ("perf tests attr: Do not store failed events") does not create an event file in the /tmp directory when the perf_open_event() system call failed. This can lead to a situation where not /tmp/event-xx-yy-zz result file exists at all (for example on a s390x virtual machine environment) where no CPUMF hardware is available. The following command then fails with a python call back chain instead of printing failure: [root@s8360046 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \ -p ./perf -v -ttest-stat-basic running './tests/attr//test-stat-basic' Traceback (most recent call last): File "./tests/attr.py", line 379, in main() File "./tests/attr.py", line 370, in main run_tests(options) File "./tests/attr.py", line 311, in run_tests Test(f, options).run() File "./tests/attr.py", line 300, in run self.compare(self.expect, self.result) File "./tests/attr.py", line 248, in compare exp_event.diff(res_event) UnboundLocalError: local variable 'res_event' referenced before assignment [root@s8360046 perf]# This patch catches this pitfall and prints an error message instead: [root@s8360047 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \ -p ./perf -vvv -ttest-stat-basic running './tests/attr//test-stat-basic' loading expected events Event event:base-stat fd = 1 group_fd = -1 flags = 0|8 [....] sample_regs_user = 0 sample_stack_user = 0 'PERF_TEST_ATTR=/tmp/tmpJbMQMP ./perf stat -o /tmp/tmpJbMQMP/perf.data -e cycles kill >/dev/null 2>&1' ret '1', expected '1' loading result events compare matching [event:base-stat] match: [event:base-stat] matches [] res_event is empty FAILED './tests/attr//test-stat-basic' - match failure [root@s8360047 perf]# Signed-off-by: Thomas-Mich Richter Acked-by: Jiri Olsa Cc: Heiko Carstens Cc: Hendrik Brueckner Cc: Martin Schwidefsky Cc: Thomas-Mich Richter LPU-Reference: 20170913081209.39570-1-tmricht@linux.vnet.ibm.com Link: http://lkml.kernel.org/n/tip-04d63nn7svfgxdhi60gq2mlm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/perf/tests/attr.py | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tools/perf/tests/attr.py b/tools/perf/tests/attr.py index 907b1b2f56ad..ff9b60b99f52 100644 --- a/tools/perf/tests/attr.py +++ b/tools/perf/tests/attr.py @@ -238,6 +238,7 @@ def compare(self, expect, result): # events in result. Fail if there's not any. for exp_name, exp_event in expect.items(): exp_list = [] + res_event = {} log.debug(" matching [%s]" % exp_name) for res_name, res_event in result.items(): log.debug(" to [%s]" % res_name) @@ -254,7 +255,10 @@ def compare(self, expect, result): if exp_event.optional(): log.debug(" %s does not match, but is optional" % exp_name) else: - exp_event.diff(res_event) + if not res_event: + log.debug(" res_event is empty"); + else: + exp_event.diff(res_event) raise Fail(self, 'match failure'); match[exp_name] = exp_list From 4d0d3ac1ec32444ad985cb92272c94d07e43e9f7 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Tue, 19 Sep 2017 19:01:40 +0900 Subject: [PATCH 0150/1013] kprobes/x86: Disable preemption in ftrace-based jprobes [ Upstream commit 5bb4fc2d8641219732eb2bb654206775a4219aca ] Disable preemption in ftrace-based jprobe handlers as described in Documentation/kprobes.txt: "Probe handlers are run with preemption disabled." This will fix jprobes behavior when CONFIG_PREEMPT=y. Signed-off-by: Masami Hiramatsu Cc: Alexei Starovoitov Cc: Alexei Starovoitov Cc: Ananth N Mavinakayanahalli Cc: Linus Torvalds Cc: Paul E . McKenney Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/150581530024.32348.9863783558598926771.stgit@devbox Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/kprobes/ftrace.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/kprobes/ftrace.c b/arch/x86/kernel/kprobes/ftrace.c index 041f7b6dfa0f..bcfee4f69b0e 100644 --- a/arch/x86/kernel/kprobes/ftrace.c +++ b/arch/x86/kernel/kprobes/ftrace.c @@ -26,7 +26,7 @@ #include "common.h" static nokprobe_inline -int __skip_singlestep(struct kprobe *p, struct pt_regs *regs, +void __skip_singlestep(struct kprobe *p, struct pt_regs *regs, struct kprobe_ctlblk *kcb, unsigned long orig_ip) { /* @@ -41,20 +41,21 @@ int __skip_singlestep(struct kprobe *p, struct pt_regs *regs, __this_cpu_write(current_kprobe, NULL); if (orig_ip) regs->ip = orig_ip; - return 1; } int skip_singlestep(struct kprobe *p, struct pt_regs *regs, struct kprobe_ctlblk *kcb) { - if (kprobe_ftrace(p)) - return __skip_singlestep(p, regs, kcb, 0); - else - return 0; + if (kprobe_ftrace(p)) { + __skip_singlestep(p, regs, kcb, 0); + preempt_enable_no_resched(); + return 1; + } + return 0; } NOKPROBE_SYMBOL(skip_singlestep); -/* Ftrace callback handler for kprobes */ +/* Ftrace callback handler for kprobes -- called under preepmt disabed */ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *ops, struct pt_regs *regs) { @@ -77,13 +78,17 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, /* Kprobe handler expects regs->ip = ip + 1 as breakpoint hit */ regs->ip = ip + sizeof(kprobe_opcode_t); + /* To emulate trap based kprobes, preempt_disable here */ + preempt_disable(); __this_cpu_write(current_kprobe, p); kcb->kprobe_status = KPROBE_HIT_ACTIVE; - if (!p->pre_handler || !p->pre_handler(p, regs)) + if (!p->pre_handler || !p->pre_handler(p, regs)) { __skip_singlestep(p, regs, kcb, orig_ip); + preempt_enable_no_resched(); + } /* * If pre_handler returns !0, it sets regs->ip and - * resets current kprobe. + * resets current kprobe, and keep preempt count +1. */ } end: From bef9bcf53789a454f54adf01c654650189628ab0 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Sat, 2 Sep 2017 13:09:45 -0700 Subject: [PATCH 0151/1013] locking/refcounts, x86/asm: Use unique .text section for refcount exceptions [ Upstream commit 564c9cc84e2adf8a6671c1937f0a9fe3da2a4b0e ] Using .text.unlikely for refcount exceptions isn't safe because gcc may move entire functions into .text.unlikely (e.g. in6_dev_dev()), which would cause any uses of a protected refcount_t function to stay inline with the function, triggering the protection unconditionally: .section .text.unlikely,"ax",@progbits .type in6_dev_get, @function in6_dev_getx: .LFB4673: .loc 2 4128 0 .cfi_startproc ... lock; incl 480(%rbx) js 111f .pushsection .text.unlikely 111: lea 480(%rbx), %rcx 112: .byte 0x0f, 0xff .popsection 113: This creates a unique .text..refcount section and adds an additional test to the exception handler to WARN in the case of having none of OF, SF, nor ZF set so we can see things like this more easily in the future. The double dot for the section name keeps it out of the TEXT_MAIN macro namespace, to avoid collisions and so it can be put at the end with text.unlikely to keep the cold code together. See commit: cb87481ee89db ("kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured") ... which matches C names: [a-zA-Z0-9_] but not ".". Reported-by: Mike Galbraith Signed-off-by: Kees Cook Cc: Ard Biesheuvel Cc: Elena Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-arch Fixes: 7a46ec0e2f48 ("locking/refcounts, x86/asm: Implement fast refcount overflow protection") Link: http://lkml.kernel.org/r/1504382986-49301-2-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/refcount.h | 2 +- arch/x86/mm/extable.c | 7 ++++++- include/asm-generic/vmlinux.lds.h | 1 + 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h index ff871210b9f2..4e44250e7d0d 100644 --- a/arch/x86/include/asm/refcount.h +++ b/arch/x86/include/asm/refcount.h @@ -15,7 +15,7 @@ * back to the regular execution flow in .text. */ #define _REFCOUNT_EXCEPTION \ - ".pushsection .text.unlikely\n" \ + ".pushsection .text..refcount\n" \ "111:\tlea %[counter], %%" _ASM_CX "\n" \ "112:\t" ASM_UD0 "\n" \ ASM_UNREACHABLE \ diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c index c3521e2be396..3321b446b66c 100644 --- a/arch/x86/mm/extable.c +++ b/arch/x86/mm/extable.c @@ -67,12 +67,17 @@ bool ex_handler_refcount(const struct exception_table_entry *fixup, * wrapped around) will be set. Additionally, seeing the refcount * reach 0 will set ZF (Zero Flag: result was zero). In each of * these cases we want a report, since it's a boundary condition. - * + * The SF case is not reported since it indicates post-boundary + * manipulations below zero or above INT_MAX. And if none of the + * flags are set, something has gone very wrong, so report it. */ if (regs->flags & (X86_EFLAGS_OF | X86_EFLAGS_ZF)) { bool zero = regs->flags & X86_EFLAGS_ZF; refcount_error_report(regs, zero ? "hit zero" : "overflow"); + } else if ((regs->flags & X86_EFLAGS_SF) == 0) { + /* Report if none of OF, ZF, nor SF are set. */ + refcount_error_report(regs, "unexpected saturation"); } return true; diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 8acfc1e099e1..e549bff87c5b 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -459,6 +459,7 @@ #define TEXT_TEXT \ ALIGN_FUNCTION(); \ *(.text.hot TEXT_MAIN .text.fixup .text.unlikely) \ + *(.text..refcount) \ *(.ref.text) \ MEM_KEEP(init.text) \ MEM_KEEP(exit.text) \ From a9f359f24ce70a291997b3644aeead56d0c44cd6 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 11 Sep 2017 11:24:23 +0200 Subject: [PATCH 0152/1013] s390/ptrace: fix guarded storage regset handling [ Upstream commit 5ef2d5231d547c672c67bdf84c13a4adaf477964 ] If the guarded storage regset for current is supposed to be changed, the regset from user space is copied directly into the guarded storage control block. If then the process gets scheduled away while the control block is being copied and before the new control block has been loaded, the result is random: the process can be scheduled away due to a page fault or preemption. If that happens the already copied parts will be overwritten by save_gs_cb(), called from switch_to(). Avoid this by copying the data to a temporary buffer on the stack and do the actual update with preemption disabled. Fixes: f5bbd7219891 ("s390/ptrace: guarded storage regset for the current task") Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/ptrace.c | 33 ++++++++++++++++++++++----------- 1 file changed, 22 insertions(+), 11 deletions(-) diff --git a/arch/s390/kernel/ptrace.c b/arch/s390/kernel/ptrace.c index 1427d60ce628..56e0190d6e65 100644 --- a/arch/s390/kernel/ptrace.c +++ b/arch/s390/kernel/ptrace.c @@ -1172,26 +1172,37 @@ static int s390_gs_cb_set(struct task_struct *target, unsigned int pos, unsigned int count, const void *kbuf, const void __user *ubuf) { - struct gs_cb *data = target->thread.gs_cb; + struct gs_cb gs_cb = { }, *data = NULL; int rc; if (!MACHINE_HAS_GS) return -ENODEV; - if (!data) { + if (!target->thread.gs_cb) { data = kzalloc(sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; - data->gsd = 25; - target->thread.gs_cb = data; - if (target == current) - __ctl_set_bit(2, 4); - } else if (target == current) { - save_gs_cb(data); } + if (!target->thread.gs_cb) + gs_cb.gsd = 25; + else if (target == current) + save_gs_cb(&gs_cb); + else + gs_cb = *target->thread.gs_cb; rc = user_regset_copyin(&pos, &count, &kbuf, &ubuf, - data, 0, sizeof(struct gs_cb)); - if (target == current) - restore_gs_cb(data); + &gs_cb, 0, sizeof(gs_cb)); + if (rc) { + kfree(data); + return -EFAULT; + } + preempt_disable(); + if (!target->thread.gs_cb) + target->thread.gs_cb = data; + *target->thread.gs_cb = gs_cb; + if (target == current) { + __ctl_set_bit(2, 4); + restore_gs_cb(target->thread.gs_cb); + } + preempt_enable(); return rc; } From cf33c88d64557ab627eca91c79c8a7891a9ba9d4 Mon Sep 17 00:00:00 2001 From: Arnaldo Carvalho de Melo Date: Thu, 21 Sep 2017 12:12:17 -0300 Subject: [PATCH 0153/1013] tools include: Do not use poison with C++ MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 6ae8eefc6c8fe050f057781b70a83262eb0a61ee ] LIST_POISON[12] are used to initialize list_head and hlist_node pointers, and do void pointer arithmetic, which C++ doesn't like, so, to avoid drifting from the kernel by introducing some HLIST_POISON to do away with void pointer math, just make those poisoned pointers be NULL when building it with a C++ compiler. Noticed with: $ make LLVM_CONFIG=/usr/bin/llvm-config-3.9 LIBCLANGLLVM=1 CXX util/c++/clang.o CXX util/c++/clang-test.o In file included from /home/lizj/linux/tools/include/linux/list.h:5:0, from /home/lizj/linux/tools/perf/util/namespaces.h:13, from /home/lizj/linux/tools/perf/util/util.h:15, from /home/lizj/linux/tools/perf/util/util-cxx.h:20, from util/c++/clang-c.h:5, from util/c++/clang-test.cpp:2: /home/lizj/linux/tools/include/linux/list.h: In function ‘void list_del(list_head*)’: /home/lizj/linux/tools/include/linux/poison.h:14:31: error: pointer of type ‘void *’ used in arithmetic [-Werror=pointer-arith] # define POISON_POINTER_DELTA 0 ^ /home/lizj/linux/tools/include/linux/poison.h:22:41: note: in expansion of macro ‘POISON_POINTER_DELTA’ #define LIST_POISON1 ((void *) 0x100 + POISON_POINTER_DELTA) ^ /home/lizj/linux/tools/include/linux/list.h:107:16: note: in expansion of macro ‘LIST_POISON1’ entry->next = LIST_POISON1; ^ In file included from /home/lizj/linux/tools/perf/util/namespaces.h:13:0, from /home/lizj/linux/tools/perf/util/util.h:15, from /home/lizj/linux/tools/perf/util/util-cxx.h:20, from util/c++/clang-c.h:5, from util/c++/clang-test.cpp:2: /home/lizj/linux/tools/include/linux/list.h:107:14: error: invalid conversion from ‘void*’ to ‘list_head*’ [-fpermissive] Reported-by: Li Zhijian Cc: Adrian Hunter Cc: Alexander Shishkin Cc: David Ahern Cc: Jiri Olsa Cc: Namhyung Kim Cc: Philip Li Cc: Wang Nan Link: http://lkml.kernel.org/n/tip-m5ei2o0mjshucbr28baf5lqz@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/include/linux/poison.h | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tools/include/linux/poison.h b/tools/include/linux/poison.h index 4bf6777a8a03..9fdcd3eaac3b 100644 --- a/tools/include/linux/poison.h +++ b/tools/include/linux/poison.h @@ -15,6 +15,10 @@ # define POISON_POINTER_DELTA 0 #endif +#ifdef __cplusplus +#define LIST_POISON1 NULL +#define LIST_POISON2 NULL +#else /* * These are non-NULL pointers that will result in page faults * under normal circumstances, used to verify that nobody uses @@ -22,6 +26,7 @@ */ #define LIST_POISON1 ((void *) 0x100 + POISON_POINTER_DELTA) #define LIST_POISON2 ((void *) 0x200 + POISON_POINTER_DELTA) +#endif /********** include/linux/timer.h **********/ /* From 6d29ae49cd31597586835fd22957adba5368cdb1 Mon Sep 17 00:00:00 2001 From: Martin Kepplinger Date: Wed, 13 Sep 2017 21:14:19 +0200 Subject: [PATCH 0154/1013] perf tools: Fix leaking rec_argv in error cases [ Upstream commit c896f85a7c15ab9d040ffac8b8003e47996602a2 ] Let's free the allocated rec_argv in case we return early, in order to avoid leaking memory. This adds free() at a few very similar places across the tree where it was missing. Signed-off-by: Martin Kepplinger Cc: Alexander Shishkin Cc: Martin kepplinger Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20170913191419.29806-1-martink@posteo.de Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/perf/builtin-c2c.c | 1 + tools/perf/builtin-mem.c | 1 + tools/perf/builtin-timechart.c | 4 +++- tools/perf/builtin-trace.c | 1 + 4 files changed, 6 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index fd32ad08c6d4..d00aac51130d 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c @@ -2733,6 +2733,7 @@ static int perf_c2c__record(int argc, const char **argv) if (!perf_mem_events[j].supported) { pr_err("failed: event '%s' not supported\n", perf_mem_events[j].name); + free(rec_argv); return -1; } diff --git a/tools/perf/builtin-mem.c b/tools/perf/builtin-mem.c index 4db960085273..e15efba605f6 100644 --- a/tools/perf/builtin-mem.c +++ b/tools/perf/builtin-mem.c @@ -113,6 +113,7 @@ static int __cmd_record(int argc, const char **argv, struct perf_mem *mem) if (!perf_mem_events[j].supported) { pr_err("failed: event '%s' not supported\n", perf_mem_events__name(j)); + free(rec_argv); return -1; } diff --git a/tools/perf/builtin-timechart.c b/tools/perf/builtin-timechart.c index 4e2e61695986..01de01ca14f2 100644 --- a/tools/perf/builtin-timechart.c +++ b/tools/perf/builtin-timechart.c @@ -1732,8 +1732,10 @@ static int timechart__io_record(int argc, const char **argv) if (rec_argv == NULL) return -ENOMEM; - if (asprintf(&filter, "common_pid != %d", getpid()) < 0) + if (asprintf(&filter, "common_pid != %d", getpid()) < 0) { + free(rec_argv); return -ENOMEM; + } p = rec_argv; for (i = 0; i < common_args_nr; i++) diff --git a/tools/perf/builtin-trace.c b/tools/perf/builtin-trace.c index d5d7fff1c211..8e3c4ec00017 100644 --- a/tools/perf/builtin-trace.c +++ b/tools/perf/builtin-trace.c @@ -2086,6 +2086,7 @@ static int trace__record(struct trace *trace, int argc, const char **argv) rec_argv[j++] = "syscalls:sys_enter,syscalls:sys_exit"; else { pr_err("Neither raw_syscalls nor syscalls events exist.\n"); + free(rec_argv); return -1; } } From 342ee877580058554fe68480756c7d326b625bed Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Sat, 9 Sep 2017 00:56:03 +0300 Subject: [PATCH 0155/1013] mm, x86/mm: Fix performance regression in get_user_pages_fast() [ Upstream commit 5b65c4677a57a1d4414212f9995aa0e46a21ff80 ] The 0-day test bot found a performance regression that was tracked down to switching x86 to the generic get_user_pages_fast() implementation: http://lkml.kernel.org/r/20170710024020.GA26389@yexl-desktop The regression was caused by the fact that we now use local_irq_save() + local_irq_restore() in get_user_pages_fast() to disable interrupts. In x86 implementation local_irq_disable() + local_irq_enable() was used. The fix is to make get_user_pages_fast() use local_irq_disable(), leaving local_irq_save() for __get_user_pages_fast() that can be called with interrupts disabled. Numbers for pinning a gigabyte of memory, one page a time, 20 repeats: Before: Average: 14.91 ms, stddev: 0.45 ms After: Average: 10.76 ms, stddev: 0.18 ms Signed-off-by: Kirill A. Shutemov Cc: Andrew Morton Cc: Huang Ying Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Thorsten Leemhuis Cc: linux-mm@kvack.org Fixes: e585513b76f7 ("x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation") Link: http://lkml.kernel.org/r/20170908215603.9189-3-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/gup.c | 97 +++++++++++++++++++++++++++++++++----------------------- 1 file changed, 58 insertions(+), 39 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 165ba2174c75..e0d82b6706d7 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1707,6 +1707,47 @@ static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end, return 1; } +static void gup_pgd_range(unsigned long addr, unsigned long end, + int write, struct page **pages, int *nr) +{ + unsigned long next; + pgd_t *pgdp; + + pgdp = pgd_offset(current->mm, addr); + do { + pgd_t pgd = READ_ONCE(*pgdp); + + next = pgd_addr_end(addr, end); + if (pgd_none(pgd)) + return; + if (unlikely(pgd_huge(pgd))) { + if (!gup_huge_pgd(pgd, pgdp, addr, next, write, + pages, nr)) + return; + } else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) { + if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr, + PGDIR_SHIFT, next, write, pages, nr)) + return; + } else if (!gup_p4d_range(pgd, addr, next, write, pages, nr)) + return; + } while (pgdp++, addr = next, addr != end); +} + +#ifndef gup_fast_permitted +/* + * Check if it's allowed to use __get_user_pages_fast() for the range, or + * we need to fall back to the slow version: + */ +bool gup_fast_permitted(unsigned long start, int nr_pages, int write) +{ + unsigned long len, end; + + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + return end >= start; +} +#endif + /* * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back to * the regular GUP. It will only return non-negative values. @@ -1714,10 +1755,8 @@ static int gup_p4d_range(pgd_t pgd, unsigned long addr, unsigned long end, int __get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages) { - struct mm_struct *mm = current->mm; unsigned long addr, len, end; - unsigned long next, flags; - pgd_t *pgdp; + unsigned long flags; int nr = 0; start &= PAGE_MASK; @@ -1741,45 +1780,15 @@ int __get_user_pages_fast(unsigned long start, int nr_pages, int write, * block IPIs that come from THPs splitting. */ - local_irq_save(flags); - pgdp = pgd_offset(mm, addr); - do { - pgd_t pgd = READ_ONCE(*pgdp); - - next = pgd_addr_end(addr, end); - if (pgd_none(pgd)) - break; - if (unlikely(pgd_huge(pgd))) { - if (!gup_huge_pgd(pgd, pgdp, addr, next, write, - pages, &nr)) - break; - } else if (unlikely(is_hugepd(__hugepd(pgd_val(pgd))))) { - if (!gup_huge_pd(__hugepd(pgd_val(pgd)), addr, - PGDIR_SHIFT, next, write, pages, &nr)) - break; - } else if (!gup_p4d_range(pgd, addr, next, write, pages, &nr)) - break; - } while (pgdp++, addr = next, addr != end); - local_irq_restore(flags); + if (gup_fast_permitted(start, nr_pages, write)) { + local_irq_save(flags); + gup_pgd_range(addr, end, write, pages, &nr); + local_irq_restore(flags); + } return nr; } -#ifndef gup_fast_permitted -/* - * Check if it's allowed to use __get_user_pages_fast() for the range, or - * we need to fall back to the slow version: - */ -bool gup_fast_permitted(unsigned long start, int nr_pages, int write) -{ - unsigned long len, end; - - len = (unsigned long) nr_pages << PAGE_SHIFT; - end = start + len; - return end >= start; -} -#endif - /** * get_user_pages_fast() - pin user pages in memory * @start: starting user address @@ -1799,12 +1808,22 @@ bool gup_fast_permitted(unsigned long start, int nr_pages, int write) int get_user_pages_fast(unsigned long start, int nr_pages, int write, struct page **pages) { + unsigned long addr, len, end; int nr = 0, ret = 0; start &= PAGE_MASK; + addr = start; + len = (unsigned long) nr_pages << PAGE_SHIFT; + end = start + len; + + if (unlikely(!access_ok(write ? VERIFY_WRITE : VERIFY_READ, + (void __user *)start, len))) + return 0; if (gup_fast_permitted(start, nr_pages, write)) { - nr = __get_user_pages_fast(start, nr_pages, write, pages); + local_irq_disable(); + gup_pgd_range(addr, end, write, pages, &nr); + local_irq_enable(); ret = nr; } From 68744edd6edcfa663ab917f62827919a0ca95b05 Mon Sep 17 00:00:00 2001 From: Ladislav Michl Date: Fri, 25 Aug 2017 07:39:16 +0200 Subject: [PATCH 0156/1013] iio: adc: ti-ads1015: add 10% to conversion wait time MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit fe895ac88b9fbdf2026f0bfd56c82747bb9d7c48 ] As user's guide "ADS1015EVM, ADS1115EVM, ADS1015EVM-PDK, ADS1115EVM-PDK User Guide (Rev. B)" (http://www.ti.com/lit/ug/sbau157b/sbau157b.pdf) states at page 16: "Note that both the ADS1115 and ADS1015 have internal clocks with a ±10% accuracy. If performing FFT tests, frequencies may appear to be incorrect as a result of this tolerance range.", add those 10% to converion wait time. Cc: Daniel Baluta Cc: Jonathan Cameron Signed-off-by: Ladislav Michl Reviewed-by: Akinobu Mita Signed-off-by: Jonathan Cameron Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/ti-ads1015.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iio/adc/ti-ads1015.c b/drivers/iio/adc/ti-ads1015.c index e0dc20488335..9ac2fb032df6 100644 --- a/drivers/iio/adc/ti-ads1015.c +++ b/drivers/iio/adc/ti-ads1015.c @@ -369,6 +369,7 @@ int ads1015_get_adc_result(struct ads1015_data *data, int chan, int *val) conv_time = DIV_ROUND_UP(USEC_PER_SEC, data->data_rate[dr_old]); conv_time += DIV_ROUND_UP(USEC_PER_SEC, data->data_rate[dr]); + conv_time += conv_time / 10; /* 10% internal clock inaccuracy */ usleep_range(conv_time, conv_time + 1); data->conv_invalid = false; } From 53ce4d4aad93e2b48999ab9a225a84bfa54760fe Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Thu, 6 Jul 2017 23:53:11 -0500 Subject: [PATCH 0157/1013] iio: multiplexer: add NULL check on devm_kzalloc() and devm_kmemdup() return values [ Upstream commit dd92d5ea20ef8a42be7aeda08c669c586c730451 ] Check return values from call to devm_kzalloc() and devm_kmemup() in order to prevent a NULL pointer dereference. This issue was detected using Coccinelle and the following semantic patch: @@ expression x; identifier fld; @@ * x = devm_kzalloc(...); ... when != x == NULL x->fld Fixes: 7ba9df54b091 ("iio: multiplexer: new iio category and iio-mux driver") Signed-off-by: Gustavo A. R. Silva Acked-by: Peter Rosin Signed-off-by: Jonathan Cameron Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/iio/multiplexer/iio-mux.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/iio/multiplexer/iio-mux.c b/drivers/iio/multiplexer/iio-mux.c index 37ba007f8dca..74831fcd0313 100644 --- a/drivers/iio/multiplexer/iio-mux.c +++ b/drivers/iio/multiplexer/iio-mux.c @@ -285,6 +285,9 @@ static int mux_configure_channel(struct device *dev, struct mux *mux, child->ext_info_cache = devm_kzalloc(dev, sizeof(*child->ext_info_cache) * num_ext_info, GFP_KERNEL); + if (!child->ext_info_cache) + return -ENOMEM; + for (i = 0; i < num_ext_info; ++i) { child->ext_info_cache[i].size = -1; @@ -309,6 +312,9 @@ static int mux_configure_channel(struct device *dev, struct mux *mux, child->ext_info_cache[i].data = devm_kmemdup(dev, page, ret + 1, GFP_KERNEL); + if (!child->ext_info_cache[i].data) + return -ENOMEM; + child->ext_info_cache[i].data[ret] = 0; child->ext_info_cache[i].size = ret; } From 340d45d70850ea395a7e397a94ab59972581022c Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Sat, 2 Sep 2017 13:09:46 -0700 Subject: [PATCH 0158/1013] locking/refcounts, x86/asm: Enable CONFIG_ARCH_HAS_REFCOUNT [ Upstream commit 39208aa7ecb7d9c4e86df782b5693270313cbab1 ] With the section inlining bug fixed for the x86 refcount protection, we can turn the config back on. Signed-off-by: Kees Cook Cc: Ard Biesheuvel Cc: Elena Cc: Linus Torvalds Cc: Mike Galbraith Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-arch Link: http://lkml.kernel.org/r/1504382986-49301-3-git-send-email-keescook@chromium.org Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 2fdb23313dd5..9bceea6a5852 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -56,7 +56,7 @@ config X86 select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_PMEM_API if X86_64 # Causing hangs/crashes, see the commit that added this change for details. - select ARCH_HAS_REFCOUNT if BROKEN + select ARCH_HAS_REFCOUNT select ARCH_HAS_UACCESS_FLUSHCACHE if X86_64 select ARCH_HAS_SET_MEMORY select ARCH_HAS_SG_CHAIN From f8d07852819a0c2da239fca155342b4f80f4aedf Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Fri, 22 Sep 2017 14:40:47 +0530 Subject: [PATCH 0159/1013] powerpc/jprobes: Disable preemption when triggered through ftrace commit 6baea433bc84cd148af1c524389a8d756f67412e upstream. KPROBES_SANITY_TEST throws the below splat when CONFIG_PREEMPT is enabled: Kprobe smoke test: started DEBUG_LOCKS_WARN_ON(val > preempt_count()) ------------[ cut here ]------------ WARNING: CPU: 19 PID: 1 at kernel/sched/core.c:3094 preempt_count_sub+0xcc/0x140 Modules linked in: CPU: 19 PID: 1 Comm: swapper/0 Not tainted 4.13.0-rc7-nnr+ #97 task: c0000000fea80000 task.stack: c0000000feb00000 NIP: c00000000011d3dc LR: c00000000011d3d8 CTR: c000000000a090d0 REGS: c0000000feb03400 TRAP: 0700 Not tainted (4.13.0-rc7-nnr+) MSR: 8000000000021033 CR: 28000282 XER: 00000000 CFAR: c00000000015aa18 SOFTE: 0 NIP preempt_count_sub+0xcc/0x140 LR preempt_count_sub+0xc8/0x140 Call Trace: preempt_count_sub+0xc8/0x140 (unreliable) kprobe_handler+0x228/0x4b0 program_check_exception+0x58/0x3b0 program_check_common+0x16c/0x170 --- interrupt: 0 at kprobe_target+0x8/0x20 LR = init_test_probes+0x248/0x7d0 kp+0x0/0x80 (unreliable) livepatch_handler+0x38/0x74 init_kprobes+0x1d8/0x208 do_one_initcall+0x68/0x1d0 kernel_init_freeable+0x298/0x374 kernel_init+0x24/0x160 ret_from_kernel_thread+0x5c/0x70 Instruction dump: 419effdc 3d22001b 39299240 81290000 2f890000 409effc8 3c82ffcb 3c62ffcb 3884bc68 3863bc18 4803d5fd 60000000 <0fe00000> 4bffffa8 60000000 60000000 ---[ end trace 432dd46b4ce3d29f ]--- Kprobe smoke test: passed successfully The issue is that we aren't disabling preemption in kprobe_ftrace_handler(). Disable it. Fixes: ead514d5fb30a0 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE") Acked-by: Masami Hiramatsu Signed-off-by: Naveen N. Rao [mpe: Trim oops a little for formatting] Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/kprobes-ftrace.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c index 6c089d9757c9..2d81404f818c 100644 --- a/arch/powerpc/kernel/kprobes-ftrace.c +++ b/arch/powerpc/kernel/kprobes-ftrace.c @@ -65,6 +65,7 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip, /* Disable irq for emulating a breakpoint and avoiding preempt */ local_irq_save(flags); hard_irq_disable(); + preempt_disable(); p = get_kprobe((kprobe_opcode_t *)nip); if (unlikely(!p) || kprobe_disabled(p)) @@ -86,12 +87,18 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip, kcb->kprobe_status = KPROBE_HIT_ACTIVE; if (!p->pre_handler || !p->pre_handler(p, regs)) __skip_singlestep(p, regs, kcb, orig_nip); - /* - * If pre_handler returns !0, it sets regs->nip and - * resets current kprobe. - */ + else { + /* + * If pre_handler returns !0, it sets regs->nip and + * resets current kprobe. In this case, we still need + * to restore irq, but not preemption. + */ + local_irq_restore(flags); + return; + } } end: + preempt_enable_no_resched(); local_irq_restore(flags); } NOKPROBE_SYMBOL(kprobe_ftrace_handler); From 1ffabfc1d58ba586fa11cc3cde21d54afec03387 Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Mon, 23 Oct 2017 22:07:38 +0530 Subject: [PATCH 0160/1013] powerpc/kprobes: Disable preemption before invoking probe handler for optprobes commit 8a2d71a3f2737e2448aa68de2b6052cb570d3d2a upstream. Per Documentation/kprobes.txt, probe handlers need to be invoked with preemption disabled. Update optimized_callback() to do so. Also move get_kprobe_ctlblk() invocation post preemption disable, since it accesses pre-cpu data. This was not an issue so far since optprobes wasn't selected if CONFIG_PREEMPT was enabled. Commit a30b85df7d599f ("kprobes: Use synchronize_rcu_tasks() for optprobe with CONFIG_PREEMPT=y") changes this. Signed-off-by: Naveen N. Rao Acked-by: Masami Hiramatsu Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/optprobes.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/optprobes.c b/arch/powerpc/kernel/optprobes.c index 91e037ab20a1..60ba7f1370a8 100644 --- a/arch/powerpc/kernel/optprobes.c +++ b/arch/powerpc/kernel/optprobes.c @@ -115,7 +115,6 @@ static unsigned long can_optimize(struct kprobe *p) static void optimized_callback(struct optimized_kprobe *op, struct pt_regs *regs) { - struct kprobe_ctlblk *kcb = get_kprobe_ctlblk(); unsigned long flags; /* This is possible if op is under delayed unoptimizing */ @@ -124,13 +123,14 @@ static void optimized_callback(struct optimized_kprobe *op, local_irq_save(flags); hard_irq_disable(); + preempt_disable(); if (kprobe_running()) { kprobes_inc_nmissed_count(&op->kp); } else { __this_cpu_write(current_kprobe, &op->kp); regs->nip = (unsigned long)op->kp.addr; - kcb->kprobe_status = KPROBE_HIT_ACTIVE; + get_kprobe_ctlblk()->kprobe_status = KPROBE_HIT_ACTIVE; opt_pre_handler(&op->kp, regs); __this_cpu_write(current_kprobe, NULL); } @@ -140,6 +140,7 @@ static void optimized_callback(struct optimized_kprobe *op, * local_irq_restore() will re-enable interrupts, * if they were hard disabled. */ + preempt_enable_no_resched(); local_irq_restore(flags); } NOKPROBE_SYMBOL(optimized_callback); From 779bfa90bdf97a6d00459471f6d371f6036f3da5 Mon Sep 17 00:00:00 2001 From: Dominik Behr Date: Thu, 7 Sep 2017 16:02:46 -0300 Subject: [PATCH 0161/1013] dma-buf/sw_sync: force signal all unsignaled fences on dying timeline commit ea4d5a270b57fa8d4871f372ca9b97b7697fdfda upstream. To avoid hanging userspace components that might have been waiting on the active fences of the destroyed timeline we need to signal with error all remaining fences on such timeline. This restore the default behaviour of the Android sw_sync framework, which Android still relies on. It was broken on the dma fence conversion a few years ago and never fixed. v2: Do not bother with cleanup do the list (Chris Wilson) Reviewed-by: Chris Wilson Signed-off-by: Dominik Behr Signed-off-by: Gustavo Padovan Link: https://patchwork.freedesktop.org/patch/msgid/20170907190246.16425-2-gustavo@padovan.org Cc: Jisheng Zhang Signed-off-by: Greg Kroah-Hartman --- drivers/dma-buf/sw_sync.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c index 38cc7389a6c1..24f83f9eeaed 100644 --- a/drivers/dma-buf/sw_sync.c +++ b/drivers/dma-buf/sw_sync.c @@ -321,8 +321,16 @@ static int sw_sync_debugfs_open(struct inode *inode, struct file *file) static int sw_sync_debugfs_release(struct inode *inode, struct file *file) { struct sync_timeline *obj = file->private_data; + struct sync_pt *pt, *next; + + spin_lock_irq(&obj->lock); + + list_for_each_entry_safe(pt, next, &obj->pt_list, link) { + dma_fence_set_error(&pt->base, -ENOENT); + dma_fence_signal_locked(&pt->base); + } - smp_wmb(); + spin_unlock_irq(&obj->lock); sync_timeline_put(obj); return 0; From a633c7ed4a4c6097475de05d950e35cf92b8dec7 Mon Sep 17 00:00:00 2001 From: Gilad Ben-Yossef Date: Thu, 9 Nov 2017 09:16:09 +0000 Subject: [PATCH 0162/1013] staging: ccree: fix leak of import() after init() commit c5f39d07860c35e5e4c63188139465af790f86ce upstream. crypto_ahash_import() may be called either after crypto_ahash_init() or without such call. Right now we always internally call init() as part of import(), thus leaking memory and mappings if the user has already called init() herself. Fix this by only calling init() internally if the state is not already initialized. Fixes: commit 454527d0d94f ("staging: ccree: fix hash import/export") Signed-off-by: Gilad Ben-Yossef Reviewed-by: Dan Carpenter Signed-off-by: Greg Kroah-Hartman --- drivers/staging/ccree/ssi_hash.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/staging/ccree/ssi_hash.c b/drivers/staging/ccree/ssi_hash.c index 13291aeaf350..f72ca485c86f 100644 --- a/drivers/staging/ccree/ssi_hash.c +++ b/drivers/staging/ccree/ssi_hash.c @@ -1790,9 +1790,12 @@ static int ssi_ahash_import(struct ahash_request *req, const void *in) } in += sizeof(u32); - rc = ssi_hash_init(state, ctx); - if (rc) - goto out; + /* call init() to allocate bufs if the user hasn't */ + if (!state->digest_buff) { + rc = ssi_hash_init(state, ctx); + if (rc) + goto out; + } dma_sync_single_for_cpu(dev, state->digest_buff_dma_addr, ctx->inter_digestsize, DMA_BIDIRECTIONAL); From 755532dd5f643f1a1da479bb133a946ce5c97905 Mon Sep 17 00:00:00 2001 From: Mike Looijmans Date: Thu, 9 Nov 2017 13:16:46 +0100 Subject: [PATCH 0163/1013] usb: hub: Cycle HUB power when initialization fails commit 973593a960ddac0f14f0d8877d2d0abe0afda795 upstream. Sometimes the USB device gets confused about the state of the initialization and the connection fails. In particular, the device thinks that it's already set up and running while the host thinks the device still needs to be configured. To work around this issue, power-cycle the hub's output to issue a sort of "reset" to the device. This makes the device restart its state machine and then the initialization succeeds. This fixes problems where the kernel reports a list of errors like this: usb 1-1.3: device not accepting address 19, error -71 The end result is a non-functioning device. After this patch, the sequence becomes like this: usb 1-1.3: new high-speed USB device number 18 using ci_hdrc usb 1-1.3: device not accepting address 18, error -71 usb 1-1.3: new high-speed USB device number 19 using ci_hdrc usb 1-1.3: device not accepting address 19, error -71 usb 1-1-port3: attempt power cycle usb 1-1.3: new high-speed USB device number 21 using ci_hdrc usb-storage 1-1.3:1.2: USB Mass Storage device detected Signed-off-by: Mike Looijmans Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/hub.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index e9ce6bb0b22d..8f7d94239ee3 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -4935,6 +4935,15 @@ static void hub_port_connect(struct usb_hub *hub, int port1, u16 portstatus, usb_put_dev(udev); if ((status == -ENOTCONN) || (status == -ENOTSUPP)) break; + + /* When halfway through our retry count, power-cycle the port */ + if (i == (SET_CONFIG_TRIES / 2) - 1) { + dev_info(&port_dev->dev, "attempt power cycle\n"); + usb_hub_set_port_power(hdev, hub, port1, false); + msleep(2 * hub_power_on_good_delay(hub)); + usb_hub_set_port_power(hdev, hub, port1, true); + msleep(hub_power_on_good_delay(hub)); + } } if (hub->hdev->parent || !hcd->driver->port_handed_over || From 8bc546bbfec9be32108af2684a2766629acc052d Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Sat, 11 Nov 2017 16:31:18 +0100 Subject: [PATCH 0164/1013] USB: ulpi: fix bus-node lookup commit 33c309ebc797b908029fd3a0851aefe697e9b598 upstream. Fix bus-node lookup during registration, which ended up searching the whole device tree depth-first starting at the parent (or grand parent) rather than just matching on its children. To make things worse, the parent (or grand-parent) node could end being prematurely freed as well. Fixes: ef6a7bcfb01c ("usb: ulpi: Support device discovery via DT") Reported-by: Peter Robinson Reported-by: Stephen Boyd Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/common/ulpi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/usb/common/ulpi.c b/drivers/usb/common/ulpi.c index 4aa5195db8ea..e02acfb1ca95 100644 --- a/drivers/usb/common/ulpi.c +++ b/drivers/usb/common/ulpi.c @@ -183,9 +183,9 @@ static int ulpi_of_register(struct ulpi *ulpi) /* Find a ulpi bus underneath the parent or the grandparent */ parent = ulpi->dev.parent; if (parent->of_node) - np = of_find_node_by_name(parent->of_node, "ulpi"); + np = of_get_child_by_name(parent->of_node, "ulpi"); else if (parent->parent && parent->parent->of_node) - np = of_find_node_by_name(parent->parent->of_node, "ulpi"); + np = of_get_child_by_name(parent->parent->of_node, "ulpi"); if (!np) return 0; From 33288619482a876f2575ed308ac2c1c3199a6a1a Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Fri, 1 Dec 2017 13:41:19 +0200 Subject: [PATCH 0165/1013] xhci: Don't show incorrect WARN message about events for empty rings commit e4ec40ec4b260efcca15089de4285a0a3411259b upstream. xHC can generate two events for a short transfer if the short TRB and last TRB in the TD are not the same TRB. The driver will handle the TD after the first short event, and remove it from its internal list. Driver then incorrectly prints a warning for the second event: "WARN Event TRB for slot x ep y with no TDs queued" Fix this by not printing a warning if we get a event on a empty list if the previous event was a short event. Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-ring.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 82c746e2d85c..353520005c13 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2486,12 +2486,16 @@ static int handle_tx_event(struct xhci_hcd *xhci, */ if (list_empty(&ep_ring->td_list)) { /* - * A stopped endpoint may generate an extra completion - * event if the device was suspended. Don't print - * warnings. + * Don't print wanings if it's due to a stopped endpoint + * generating an extra completion event if the device + * was suspended. Or, a event for the last TRB of a + * short TD we already got a short event for. + * The short TD is already removed from the TD list. */ + if (!(trb_comp_code == COMP_STOPPED || - trb_comp_code == COMP_STOPPED_LENGTH_INVALID)) { + trb_comp_code == COMP_STOPPED_LENGTH_INVALID || + ep_ring->last_td_was_short)) { xhci_warn(xhci, "WARN Event TRB for slot %d ep %d with no TDs queued?\n", TRB_TO_SLOT_ID(le32_to_cpu(event->flags)), ep_index); From 4346c775cf707a703e2b305315e37b9beee30bd6 Mon Sep 17 00:00:00 2001 From: Yu Chen Date: Fri, 1 Dec 2017 13:41:20 +0200 Subject: [PATCH 0166/1013] usb: xhci: fix panic in xhci_free_virt_devices_depth_first commit 80e457699a8dbdd70f2d26911e46f538645c55fc upstream. Check vdev->real_port 0 to avoid panic [ 9.261347] [] xhci_free_virt_devices_depth_first+0x58/0x108 [ 9.261352] [] xhci_mem_cleanup+0x1bc/0x570 [ 9.261355] [] xhci_stop+0x140/0x1c8 [ 9.261365] [] usb_remove_hcd+0xfc/0x1d0 [ 9.261369] [] xhci_plat_remove+0x6c/0xa8 [ 9.261377] [] platform_drv_remove+0x2c/0x70 [ 9.261384] [] __device_release_driver+0x80/0x108 [ 9.261387] [] device_release_driver+0x2c/0x40 [ 9.261392] [] bus_remove_device+0xe0/0x120 [ 9.261396] [] device_del+0x114/0x210 [ 9.261399] [] platform_device_del+0x30/0xa0 [ 9.261403] [] dwc3_otg_work+0x204/0x488 [ 9.261407] [] event_work+0x304/0x5b8 [ 9.261414] [] process_one_work+0x148/0x490 [ 9.261417] [] worker_thread+0x50/0x4a0 [ 9.261421] [] kthread+0xe8/0x100 [ 9.261427] [] ret_from_fork+0x10/0x50 The problem can occur if xhci_plat_remove() is called shortly after xhci_plat_probe(). While xhci_free_virt_devices_depth_first been called before the device has been setup and get real_port initialized. The problem occurred on Hikey960 and was reproduced by Guenter Roeck on Kevin with chromeos-4.4. Fixes: ee8665e28e8d ("xhci: free xhci virtual devices with leaf nodes first") Cc: Guenter Roeck Reviewed-by: Guenter Roeck Tested-by: Guenter Roeck Signed-off-by: Fan Ning Signed-off-by: Li Rui Signed-off-by: yangdi Signed-off-by: Yu Chen Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mem.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index 2a82c927ded2..97f30eb7dac0 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -947,6 +947,12 @@ void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id) if (!vdev) return; + if (vdev->real_port == 0 || + vdev->real_port > HCS_MAX_PORTS(xhci->hcs_params1)) { + xhci_dbg(xhci, "Bad vdev->real_port.\n"); + goto out; + } + tt_list_head = &(xhci->rh_bw[vdev->real_port - 1].tts); list_for_each_entry_safe(tt_info, next, tt_list_head, tt_list) { /* is this a hub device that added a tt_info to the tts list */ @@ -960,6 +966,7 @@ void xhci_free_virt_devices_depth_first(struct xhci_hcd *xhci, int slot_id) } } } +out: /* we are now at a leaf device */ xhci_free_virt_device(xhci, slot_id); } From 903b2d14d0e57e48fd80d5a9d35d9fbf3c1e63ea Mon Sep 17 00:00:00 2001 From: Masakazu Mokuno Date: Fri, 10 Nov 2017 01:25:50 +0900 Subject: [PATCH 0167/1013] USB: core: Add type-specific length check of BOS descriptors commit 81cf4a45360f70528f1f64ba018d61cb5767249a upstream. As most of BOS descriptors are longer in length than their header 'struct usb_dev_cap_header', comparing solely with it is not sufficient to avoid out-of-bounds access to BOS descriptors. This patch adds descriptor type specific length check in usb_get_bos_descriptor() to fix the issue. Signed-off-by: Masakazu Mokuno Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/config.c | 28 ++++++++++++++++++++++++---- include/uapi/linux/usb/ch9.h | 3 +++ 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index 883549ee946c..c42a3e63eb07 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -905,14 +905,25 @@ void usb_release_bos_descriptor(struct usb_device *dev) } } +static const __u8 bos_desc_len[256] = { + [USB_CAP_TYPE_WIRELESS_USB] = USB_DT_USB_WIRELESS_CAP_SIZE, + [USB_CAP_TYPE_EXT] = USB_DT_USB_EXT_CAP_SIZE, + [USB_SS_CAP_TYPE] = USB_DT_USB_SS_CAP_SIZE, + [USB_SSP_CAP_TYPE] = USB_DT_USB_SSP_CAP_SIZE(1), + [CONTAINER_ID_TYPE] = USB_DT_USB_SS_CONTN_ID_SIZE, + [USB_PTM_CAP_TYPE] = USB_DT_USB_PTM_ID_SIZE, +}; + /* Get BOS descriptor set */ int usb_get_bos_descriptor(struct usb_device *dev) { struct device *ddev = &dev->dev; struct usb_bos_descriptor *bos; struct usb_dev_cap_header *cap; + struct usb_ssp_cap_descriptor *ssp_cap; unsigned char *buffer; - int length, total_len, num, i; + int length, total_len, num, i, ssac; + __u8 cap_type; int ret; bos = kzalloc(sizeof(struct usb_bos_descriptor), GFP_KERNEL); @@ -965,7 +976,13 @@ int usb_get_bos_descriptor(struct usb_device *dev) dev->bos->desc->bNumDeviceCaps = i; break; } + cap_type = cap->bDevCapabilityType; length = cap->bLength; + if (bos_desc_len[cap_type] && length < bos_desc_len[cap_type]) { + dev->bos->desc->bNumDeviceCaps = i; + break; + } + total_len -= length; if (cap->bDescriptorType != USB_DT_DEVICE_CAPABILITY) { @@ -973,7 +990,7 @@ int usb_get_bos_descriptor(struct usb_device *dev) continue; } - switch (cap->bDevCapabilityType) { + switch (cap_type) { case USB_CAP_TYPE_WIRELESS_USB: /* Wireless USB cap descriptor is handled by wusb */ break; @@ -986,8 +1003,11 @@ int usb_get_bos_descriptor(struct usb_device *dev) (struct usb_ss_cap_descriptor *)buffer; break; case USB_SSP_CAP_TYPE: - dev->bos->ssp_cap = - (struct usb_ssp_cap_descriptor *)buffer; + ssp_cap = (struct usb_ssp_cap_descriptor *)buffer; + ssac = (le32_to_cpu(ssp_cap->bmAttributes) & + USB_SSP_SUBLINK_SPEED_ATTRIBS) + 1; + if (length >= USB_DT_USB_SSP_CAP_SIZE(ssac)) + dev->bos->ssp_cap = ssp_cap; break; case CONTAINER_ID_TYPE: dev->bos->ss_id = diff --git a/include/uapi/linux/usb/ch9.h b/include/uapi/linux/usb/ch9.h index cec06625f407..8512777889b0 100644 --- a/include/uapi/linux/usb/ch9.h +++ b/include/uapi/linux/usb/ch9.h @@ -876,6 +876,8 @@ struct usb_wireless_cap_descriptor { /* Ultra Wide Band */ __u8 bReserved; } __attribute__((packed)); +#define USB_DT_USB_WIRELESS_CAP_SIZE 11 + /* USB 2.0 Extension descriptor */ #define USB_CAP_TYPE_EXT 2 @@ -1068,6 +1070,7 @@ struct usb_ptm_cap_descriptor { __u8 bDevCapabilityType; } __attribute__((packed)); +#define USB_DT_USB_PTM_ID_SIZE 3 /* * The size of the descriptor for the Sublink Speed Attribute Count * (SSAC) specified in bmAttributes[4:0]. From 15f12be9860e380ccc809f0bb939b9ea385be9b1 Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Thu, 23 Nov 2017 16:39:52 +0100 Subject: [PATCH 0168/1013] USB: usbfs: Filter flags passed in from user space commit 446f666da9f019ce2ffd03800995487e79a91462 upstream. USBDEVFS_URB_ISO_ASAP must be accepted only for ISO endpoints. Improve sanity checking. Reported-by: Andrey Konovalov Signed-off-by: Oliver Neukum Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/devio.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/drivers/usb/core/devio.c b/drivers/usb/core/devio.c index 98c666ef9a57..ab245352f102 100644 --- a/drivers/usb/core/devio.c +++ b/drivers/usb/core/devio.c @@ -1455,14 +1455,18 @@ static int proc_do_submiturb(struct usb_dev_state *ps, struct usbdevfs_urb *uurb int number_of_packets = 0; unsigned int stream_id = 0; void *buf; - - if (uurb->flags & ~(USBDEVFS_URB_ISO_ASAP | - USBDEVFS_URB_SHORT_NOT_OK | + unsigned long mask = USBDEVFS_URB_SHORT_NOT_OK | USBDEVFS_URB_BULK_CONTINUATION | USBDEVFS_URB_NO_FSBR | USBDEVFS_URB_ZERO_PACKET | - USBDEVFS_URB_NO_INTERRUPT)) - return -EINVAL; + USBDEVFS_URB_NO_INTERRUPT; + /* USBDEVFS_URB_ISO_ASAP is a special case */ + if (uurb->type == USBDEVFS_URB_TYPE_ISO) + mask |= USBDEVFS_URB_ISO_ASAP; + + if (uurb->flags & ~mask) + return -EINVAL; + if ((unsigned int)uurb->buffer_length >= USBFS_XFER_MAX) return -EINVAL; if (uurb->buffer_length > 0 && !uurb->buffer) From de667b08c34c61b59fb1a71ba0eff648550a7799 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Tue, 7 Nov 2017 16:45:04 +0000 Subject: [PATCH 0169/1013] usb: host: fix incorrect updating of offset commit 1d5a31582ef046d3b233f0da1a68ae26519b2f0a upstream. The variable temp is incorrectly being updated, instead it should be offset otherwise the loop just reads the same capability value and loops forever. Thanks to Alan Stern for pointing out the correct fix to my original fix. Fix also cleans up clang warning: drivers/usb/host/ehci-dbg.c:840:4: warning: Value stored to 'temp' is never read Fixes: d49d43174400 ("USB: misc ehci updates") Signed-off-by: Colin Ian King Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/ehci-dbg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/host/ehci-dbg.c b/drivers/usb/host/ehci-dbg.c index cbb9b8e12c3c..8c5a6fee4dfd 100644 --- a/drivers/usb/host/ehci-dbg.c +++ b/drivers/usb/host/ehci-dbg.c @@ -837,7 +837,7 @@ static ssize_t fill_registers_buffer(struct debug_buffer *buf) default: /* unknown */ break; } - temp = (cap >> 8) & 0xff; + offset = (cap >> 8) & 0xff; } } #endif From c3b0874866f2282f51295dc4cf69672b2caf6039 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 4 Dec 2017 17:24:54 -0800 Subject: [PATCH 0170/1013] locking/refcounts: Do not force refcount_t usage as GPL-only export commit b562c171cf011d297059bd0265742eb5fab0ad2f upstream. The refcount_t protection on x86 was not intended to use the stricter GPL export. This adjusts the linkage again to avoid a regression in the availability of the refcount API. Reported-by: Dave Airlie Fixes: 7a46ec0e2f48 ("locking/refcounts, x86/asm: Implement fast refcount overflow protection") Signed-off-by: Kees Cook Signed-off-by: Linus Torvalds Cc: Ivan Kozik Cc: Thomas Backlund Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/extable.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/extable.c b/arch/x86/mm/extable.c index 3321b446b66c..30bc4812ceb8 100644 --- a/arch/x86/mm/extable.c +++ b/arch/x86/mm/extable.c @@ -82,7 +82,7 @@ bool ex_handler_refcount(const struct exception_table_entry *fixup, return true; } -EXPORT_SYMBOL_GPL(ex_handler_refcount); +EXPORT_SYMBOL(ex_handler_refcount); /* * Handler for when we fail to restore a task's FPU state. We should never get From 64138f0adb25ca8f34baa57af33260b05efe2874 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 10 Dec 2017 13:40:45 +0100 Subject: [PATCH 0171/1013] Linux 4.14.5 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ba1648c093fe..43ac7bdb10ad 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 4 +SUBLEVEL = 5 EXTRAVERSION = NAME = Petit Gorille From cb6c13d914e0f5b7badc3ec455114e4b45177c2a Mon Sep 17 00:00:00 2001 From: Yoshihiro Shimoda Date: Mon, 13 Nov 2017 17:59:18 +0900 Subject: [PATCH 0172/1013] usb: gadget: udc: renesas_usb3: fix number of the pipes commit a58204ab91ad8cae4d8474aa0ba5d1fc504860c9 upstream. This controller on R-Car Gen3 has 6 pipes that included PIPE 0 for control actually. But, the datasheet has error in writing as it has 31 pipes. (However, the previous code defined 30 pipes wrongly...) Anyway, this patch fixes it. Fixes: 746bfe63bba3 ("usb: gadget: renesas_usb3: add support for Renesas USB3.0 peripheral controller") Signed-off-by: Yoshihiro Shimoda Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/udc/renesas_usb3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/udc/renesas_usb3.c b/drivers/usb/gadget/udc/renesas_usb3.c index 63a206122058..6b3e8adb64e6 100644 --- a/drivers/usb/gadget/udc/renesas_usb3.c +++ b/drivers/usb/gadget/udc/renesas_usb3.c @@ -254,7 +254,7 @@ #define USB3_EP0_SS_MAX_PACKET_SIZE 512 #define USB3_EP0_HSFS_MAX_PACKET_SIZE 64 #define USB3_EP0_BUF_SIZE 8 -#define USB3_MAX_NUM_PIPES 30 +#define USB3_MAX_NUM_PIPES 6 /* This includes PIPE 0 */ #define USB3_WAIT_US 3 #define USB3_DMA_NUM_SETTING_AREA 4 /* From 28ddc2b45f5f409b905dc78e79d959377789fc12 Mon Sep 17 00:00:00 2001 From: Roger Quadros Date: Tue, 31 Oct 2017 15:56:29 +0200 Subject: [PATCH 0173/1013] usb: gadget: core: Fix ->udc_set_speed() speed handling commit a4f0927ef588cf62bb864707261482c874352942 upstream. Currently UDC core calls ->udc_set_speed() with the speed parameter containing the maximum speed supported by the gadget function driver. This might very well be more than that supported by the UDC controller driver. Select the lesser of the 2 speeds so both UDC and gadget function driver are operating within limits. This fixes PHY Erratic errors and 2 second enumeration delay on TI's AM437x platforms. Fixes: 6099eca796ae ("usb: gadget: core: introduce ->udc_set_speed() method") Reported-by: Dylan Howey Signed-off-by: Roger Quadros Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/udc/core.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c index d41d07aae0ce..def1b05ffca0 100644 --- a/drivers/usb/gadget/udc/core.c +++ b/drivers/usb/gadget/udc/core.c @@ -1080,8 +1080,12 @@ static inline void usb_gadget_udc_stop(struct usb_udc *udc) static inline void usb_gadget_udc_set_speed(struct usb_udc *udc, enum usb_device_speed speed) { - if (udc->gadget->ops->udc_set_speed) - udc->gadget->ops->udc_set_speed(udc->gadget, speed); + if (udc->gadget->ops->udc_set_speed) { + enum usb_device_speed s; + + s = min(speed, udc->gadget->max_speed); + udc->gadget->ops->udc_set_speed(udc->gadget, s); + } } /** From d415ccdd0be1a745904a7b12b87437463aa35a8d Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Fri, 3 Nov 2017 15:30:52 +0100 Subject: [PATCH 0174/1013] serdev: ttyport: add missing receive_buf sanity checks commit eb281683621b71ab9710d9dccbbef0c2e1769c97 upstream. The receive_buf tty-port callback should return the number of bytes accepted and must specifically never return a negative errno (or a value larger than the buffer size) to the tty layer. A serdev driver not providing a receive_buf callback would currently cause the flush_to_ldisc() worker to spin in a tight loop when the tty buffer pointers are incremented with -EINVAL (-22) after data has been received. A serdev driver occasionally returning a negative errno (or a too large byte count) could cause information leaks or crashes when accessing memory outside the tty buffers in consecutive callbacks. Fixes: cd6484e1830b ("serdev: Introduce new bus for serial attached devices") Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serdev/serdev-ttyport.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/drivers/tty/serdev/serdev-ttyport.c b/drivers/tty/serdev/serdev-ttyport.c index 302018d67efa..d4efe4354401 100644 --- a/drivers/tty/serdev/serdev-ttyport.c +++ b/drivers/tty/serdev/serdev-ttyport.c @@ -35,11 +35,22 @@ static int ttyport_receive_buf(struct tty_port *port, const unsigned char *cp, { struct serdev_controller *ctrl = port->client_data; struct serport *serport = serdev_controller_get_drvdata(ctrl); + int ret; if (!test_bit(SERPORT_ACTIVE, &serport->flags)) return 0; - return serdev_controller_receive_buf(ctrl, cp, count); + ret = serdev_controller_receive_buf(ctrl, cp, count); + + dev_WARN_ONCE(&ctrl->dev, ret < 0 || ret > count, + "receive_buf returns %d (count = %zu)\n", + ret, count); + if (ret < 0) + return 0; + else if (ret > count) + return count; + + return ret; } static void ttyport_write_wakeup(struct tty_port *port) From 5a3da8bd93c78b6a4aed19c83cb8f493dfc4dfef Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Fri, 3 Nov 2017 15:30:55 +0100 Subject: [PATCH 0175/1013] serdev: ttyport: fix NULL-deref on hangup commit 8bcd4e6a8decac251d55c4377e2e67f052777ce0 upstream. Make sure to use a properly refcounted tty_struct in write_wake up to avoid dereferencing a NULL-pointer when a port is being hung up. Fixes: bed35c6dfa6a ("serdev: add a tty port controller driver") Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serdev/serdev-ttyport.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/tty/serdev/serdev-ttyport.c b/drivers/tty/serdev/serdev-ttyport.c index d4efe4354401..ce7fc44f2fe2 100644 --- a/drivers/tty/serdev/serdev-ttyport.c +++ b/drivers/tty/serdev/serdev-ttyport.c @@ -57,12 +57,19 @@ static void ttyport_write_wakeup(struct tty_port *port) { struct serdev_controller *ctrl = port->client_data; struct serport *serport = serdev_controller_get_drvdata(ctrl); + struct tty_struct *tty; + + tty = tty_port_tty_get(port); + if (!tty) + return; - if (test_and_clear_bit(TTY_DO_WRITE_WAKEUP, &port->tty->flags) && + if (test_and_clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags) && test_bit(SERPORT_ACTIVE, &serport->flags)) serdev_controller_write_wakeup(ctrl); - wake_up_interruptible_poll(&port->tty->write_wait, POLLOUT); + wake_up_interruptible_poll(&tty->write_wait, POLLOUT); + + tty_kref_put(tty); } static const struct tty_port_client_operations client_ops = { From ce42ed5ae8982360dba581acfafdaff88f2b3fbe Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Fri, 3 Nov 2017 15:30:56 +0100 Subject: [PATCH 0176/1013] serdev: ttyport: fix tty locking in close commit 90dbad8cd6efccbdce109d5ef0724f8434a6cdde upstream. Make sure to hold the tty lock as required when calling tty-driver close() (e.g. to avoid racing with hangup()). Note that the serport active flag is currently set under the lock at controller open, but really isn't protected by it. Fixes: cd6484e1830b ("serdev: Introduce new bus for serial attached devices") Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serdev/serdev-ttyport.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/tty/serdev/serdev-ttyport.c b/drivers/tty/serdev/serdev-ttyport.c index ce7fc44f2fe2..7f785d77ba7f 100644 --- a/drivers/tty/serdev/serdev-ttyport.c +++ b/drivers/tty/serdev/serdev-ttyport.c @@ -149,8 +149,10 @@ static void ttyport_close(struct serdev_controller *ctrl) clear_bit(SERPORT_ACTIVE, &serport->flags); + tty_lock(tty); if (tty->ops->close) tty->ops->close(tty, NULL); + tty_unlock(tty); tty_release_struct(tty, serport->tty_idx); } From 4dc9c1cfa9fffed0ec812a85e4b06a68efb6afea Mon Sep 17 00:00:00 2001 From: John Keeping Date: Mon, 27 Nov 2017 18:15:40 +0000 Subject: [PATCH 0177/1013] usb: f_fs: Force Reserved1=1 in OS_DESC_EXT_COMPAT commit a3acc696085e112733d191a77b106e67a4fa110b upstream. The specification says that the Reserved1 field in OS_DESC_EXT_COMPAT must have the value "1", but when this feature was first implemented we rejected any non-zero values. This was adjusted to accept all non-zero values (while now rejecting zero) in commit 53642399aa71 ("usb: gadget: f_fs: Fix wrong check on reserved1 of OS_DESC_EXT_COMPAT"), but that breaks any userspace programs that worked previously by returning EINVAL when Reserved1 == 0 which was previously the only value that succeeded! If we just set the field to "1" ourselves, both old and new userspace programs continue to work correctly and, as a bonus, old programs are now compliant with the specification without having to fix anything themselves. Fixes: 53642399aa71 ("usb: gadget: f_fs: Fix wrong check on reserved1 of OS_DESC_EXT_COMPAT") Signed-off-by: John Keeping Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_fs.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index ef8f7d63a8f0..0202e5132fa7 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -2286,9 +2286,18 @@ static int __ffs_data_do_os_desc(enum ffs_os_desc_type type, int i; if (len < sizeof(*d) || - d->bFirstInterfaceNumber >= ffs->interfaces_count || - !d->Reserved1) + d->bFirstInterfaceNumber >= ffs->interfaces_count) return -EINVAL; + if (d->Reserved1 != 1) { + /* + * According to the spec, Reserved1 must be set to 1 + * but older kernels incorrectly rejected non-zero + * values. We fix it here to avoid returning EINVAL + * in response to values we used to accept. + */ + pr_debug("usb_ext_compat_desc::Reserved1 forced to 1\n"); + d->Reserved1 = 1; + } for (i = 0; i < ARRAY_SIZE(d->Reserved2); ++i) if (d->Reserved2[i]) return -EINVAL; From ba4eed1bd4d82ed591f07eee628b1efade7e6764 Mon Sep 17 00:00:00 2001 From: Martin Kelly Date: Mon, 27 Nov 2017 15:49:16 -0800 Subject: [PATCH 0178/1013] can: mcba_usb: fix device disconnect bug commit 1cb35a33a28394fd711bb26ddf3a564f4e9d9125 upstream. Currently, when you disconnect the device, the driver infinitely resubmits all URBs, so you see: Rx URB aborted (-32) in an infinite loop. Fix this by catching -EPIPE (what we get in urb->status when the device disconnects) and not resubmitting. With this patch, I can plug and unplug many times and the driver recovers correctly. Signed-off-by: Martin Kelly Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/mcba_usb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/can/usb/mcba_usb.c b/drivers/net/can/usb/mcba_usb.c index 7f0272558bef..a884d31fe324 100644 --- a/drivers/net/can/usb/mcba_usb.c +++ b/drivers/net/can/usb/mcba_usb.c @@ -592,6 +592,7 @@ static void mcba_usb_read_bulk_callback(struct urb *urb) break; case -ENOENT: + case -EPIPE: case -ESHUTDOWN: return; From 5b65e2916c4ba62ce543cd1366a6649a9ff5bcb2 Mon Sep 17 00:00:00 2001 From: Stephane Grosjean Date: Thu, 23 Nov 2017 15:44:35 +0100 Subject: [PATCH 0179/1013] can: peak/pci: fix potential bug when probe() fails commit 5c2cb02edf79ad79d9b8d07c6d52243a948c4c9f upstream. PCI/PCIe drivers for PEAK-System CAN/CAN-FD interfaces do some access to the PCI config during probing. In case one of these accesses fails, a POSITIVE PCIBIOS_xxx error code is returned back. This POSITIVE error code MUST be converted into a NEGATIVE errno for the probe() function to indicate it failed. Using the pcibios_err_to_errno() function, we make sure that the return code will always be negative. Signed-off-by: Stephane Grosjean Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/peak_canfd/peak_pciefd_main.c | 5 ++++- drivers/net/can/sja1000/peak_pci.c | 5 ++++- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/peak_canfd/peak_pciefd_main.c b/drivers/net/can/peak_canfd/peak_pciefd_main.c index b4efd711f824..788c3464a3b0 100644 --- a/drivers/net/can/peak_canfd/peak_pciefd_main.c +++ b/drivers/net/can/peak_canfd/peak_pciefd_main.c @@ -825,7 +825,10 @@ static int peak_pciefd_probe(struct pci_dev *pdev, err_disable_pci: pci_disable_device(pdev); - return err; + /* pci_xxx_config_word() return positive PCIBIOS_xxx error codes while + * the probe() function must return a negative errno in case of failure + * (err is unchanged if negative) */ + return pcibios_err_to_errno(err); } /* free the board structure object, as well as its resources: */ diff --git a/drivers/net/can/sja1000/peak_pci.c b/drivers/net/can/sja1000/peak_pci.c index 131026fbc2d7..5adc95c922ee 100644 --- a/drivers/net/can/sja1000/peak_pci.c +++ b/drivers/net/can/sja1000/peak_pci.c @@ -717,7 +717,10 @@ static int peak_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent) failure_disable_pci: pci_disable_device(pdev); - return err; + /* pci_xxx_config_word() return positive PCIBIOS_xxx error codes while + * the probe() function must return a negative errno in case of failure + * (err is unchanged if negative) */ + return pcibios_err_to_errno(err); } static void peak_pci_remove(struct pci_dev *pdev) From 39797254631553b51b2e64d5160ff8709a66254c Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Mon, 27 Nov 2017 09:18:21 +0100 Subject: [PATCH 0180/1013] can: flexcan: fix VF610 state transition issue commit 29c64b17a0bc72232acc45e9533221d88a262efb upstream. Enable FLEXCAN_QUIRK_BROKEN_PERR_STATE for VF610 to report correct state transitions. Tested-by: Mirza Krak Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/flexcan.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c index a13a4896a8bd..c4d1140116ea 100644 --- a/drivers/net/can/flexcan.c +++ b/drivers/net/can/flexcan.c @@ -189,7 +189,7 @@ * MX35 FlexCAN2 03.00.00.00 no no ? no no * MX53 FlexCAN2 03.00.00.00 yes no no no no * MX6s FlexCAN3 10.00.12.00 yes yes no no yes - * VF610 FlexCAN3 ? no yes ? yes yes? + * VF610 FlexCAN3 ? no yes no yes yes? * * Some SOCs do not have the RX_WARN & TX_WARN interrupt line connected. */ @@ -297,7 +297,8 @@ static const struct flexcan_devtype_data fsl_imx6q_devtype_data = { static const struct flexcan_devtype_data fsl_vf610_devtype_data = { .quirks = FLEXCAN_QUIRK_DISABLE_RXFG | FLEXCAN_QUIRK_ENABLE_EACEN_RRS | - FLEXCAN_QUIRK_DISABLE_MECR | FLEXCAN_QUIRK_USE_OFF_TIMESTAMP, + FLEXCAN_QUIRK_DISABLE_MECR | FLEXCAN_QUIRK_USE_OFF_TIMESTAMP | + FLEXCAN_QUIRK_BROKEN_PERR_STATE, }; static const struct can_bittiming_const flexcan_bittiming_const = { From 63f2bd86ccc0a2864bdb0d9bb7bb6735276019af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Oliver=20St=C3=A4bler?= Date: Mon, 20 Nov 2017 14:45:15 +0100 Subject: [PATCH 0181/1013] can: ti_hecc: Fix napi poll return value for repoll MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit f6c23b174c3c96616514827407769cbcfc8005cf upstream. After commit d75b1ade567f ("net: less interrupt masking in NAPI") napi repoll is done only when work_done == budget. So we need to return budget if there are still packets to receive. Signed-off-by: Oliver Stäbler Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/ti_hecc.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/can/ti_hecc.c b/drivers/net/can/ti_hecc.c index 4d4941469cfc..db6ea936dc3f 100644 --- a/drivers/net/can/ti_hecc.c +++ b/drivers/net/can/ti_hecc.c @@ -637,6 +637,9 @@ static int ti_hecc_rx_poll(struct napi_struct *napi, int quota) mbx_mask = hecc_read(priv, HECC_CANMIM); mbx_mask |= HECC_TX_MBOX_MASK; hecc_write(priv, HECC_CANMIM, mbx_mask); + } else { + /* repoll is done only if whole budget is used */ + num_pkts = quota; } return num_pkts; From 896ae65aabc8438a3fa9405e3996bd59e3372055 Mon Sep 17 00:00:00 2001 From: Jimmy Assarsson Date: Tue, 21 Nov 2017 08:22:26 +0100 Subject: [PATCH 0182/1013] can: kvaser_usb: free buf in error paths commit 435019b48033138581a6171093b181fc6b4d3d30 upstream. The allocated buffer was not freed if usb_submit_urb() failed. Signed-off-by: Jimmy Assarsson Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/kvaser_usb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c index 9b18d96ef526..075644591498 100644 --- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -813,6 +813,7 @@ static int kvaser_usb_simple_msg_async(struct kvaser_usb_net_priv *priv, if (err) { netdev_err(netdev, "Error transmitting URB\n"); usb_unanchor_urb(urb); + kfree(buf); usb_free_urb(urb); return err; } @@ -1768,6 +1769,7 @@ static netdev_tx_t kvaser_usb_start_xmit(struct sk_buff *skb, spin_unlock_irqrestore(&priv->tx_contexts_lock, flags); usb_unanchor_urb(urb); + kfree(buf); stats->tx_dropped++; From 5250169e7390db30c60d35515b6b40501764164d Mon Sep 17 00:00:00 2001 From: Jimmy Assarsson Date: Tue, 21 Nov 2017 08:22:27 +0100 Subject: [PATCH 0183/1013] can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback() commit e84f44eb5523401faeb9cc1c97895b68e3cfb78d upstream. The conditon in the while-loop becomes true when actual_length is less than 2 (MSG_HEADER_LEN). In best case we end up with a former, already dispatched msg, that got msg->len greater than actual_length. This will result in a "Format error" error printout. Problem seen when unplugging a Kvaser USB device connected to a vbox guest. warning: comparison between signed and unsigned integer expressions [-Wsign-compare] Signed-off-by: Jimmy Assarsson Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/kvaser_usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c index 075644591498..d87e330a20b3 100644 --- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -1334,7 +1334,7 @@ static void kvaser_usb_read_bulk_callback(struct urb *urb) goto resubmit_urb; } - while (pos <= urb->actual_length - MSG_HEADER_LEN) { + while (pos <= (int)(urb->actual_length - MSG_HEADER_LEN)) { msg = urb->transfer_buffer + pos; /* The Kvaser firmware can only read and write messages that From 6f4fd93f349036ebeb136838444336f376f0e7db Mon Sep 17 00:00:00 2001 From: Jimmy Assarsson Date: Tue, 21 Nov 2017 08:22:28 +0100 Subject: [PATCH 0184/1013] can: kvaser_usb: ratelimit errors if incomplete messages are received commit 8bd13bd522ff7dfa0eb371921aeb417155f7a3be upstream. Avoid flooding the kernel log with "Formate error", if incomplete message are received. Signed-off-by: Jimmy Assarsson Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/kvaser_usb.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c index d87e330a20b3..f95945915d20 100644 --- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -609,8 +609,8 @@ static int kvaser_usb_wait_msg(const struct kvaser_usb *dev, u8 id, } if (pos + tmp->len > actual_len) { - dev_err(dev->udev->dev.parent, - "Format error\n"); + dev_err_ratelimited(dev->udev->dev.parent, + "Format error\n"); break; } @@ -1353,7 +1353,8 @@ static void kvaser_usb_read_bulk_callback(struct urb *urb) } if (pos + msg->len > urb->actual_length) { - dev_err(dev->udev->dev.parent, "Format error\n"); + dev_err_ratelimited(dev->udev->dev.parent, + "Format error\n"); break; } From d04d52a6f268b635b4e69cbe28beaf5fd19228e1 Mon Sep 17 00:00:00 2001 From: Martin Kelly Date: Tue, 5 Dec 2017 11:15:49 -0800 Subject: [PATCH 0185/1013] can: kvaser_usb: cancel urb on -EPIPE and -EPROTO commit 6aa8d5945502baf4687d80de59b7ac865e9e666b upstream. In mcba_usb, we have observed that when you unplug the device, the driver will endlessly resubmit failing URBs, which can cause CPU stalls. This issue is fixed in mcba_usb by catching the codes seen on device disconnect (-EPIPE and -EPROTO). This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it in the same way. Signed-off-by: Martin Kelly Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/kvaser_usb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/can/usb/kvaser_usb.c b/drivers/net/can/usb/kvaser_usb.c index f95945915d20..63587b8e6825 100644 --- a/drivers/net/can/usb/kvaser_usb.c +++ b/drivers/net/can/usb/kvaser_usb.c @@ -1326,6 +1326,8 @@ static void kvaser_usb_read_bulk_callback(struct urb *urb) case 0: break; case -ENOENT: + case -EPIPE: + case -EPROTO: case -ESHUTDOWN: return; default: From b96f17231dca5ddc2136999ca147cdeb842b2664 Mon Sep 17 00:00:00 2001 From: Martin Kelly Date: Tue, 5 Dec 2017 10:34:03 -0800 Subject: [PATCH 0186/1013] can: mcba_usb: cancel urb on -EPROTO commit c7f33023308f3142433b7379718af5f0c2c322a6 upstream. When we unplug the device, we can see both -EPIPE and -EPROTO depending on exact timing and what system we run on. If we continue to resubmit URBs, they will immediately fail, and they can cause stalls, especially on slower CPUs. Fix this by not resubmitting on -EPROTO, as we already do on -EPIPE. Signed-off-by: Martin Kelly Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/mcba_usb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/can/usb/mcba_usb.c b/drivers/net/can/usb/mcba_usb.c index a884d31fe324..e0c24abce16c 100644 --- a/drivers/net/can/usb/mcba_usb.c +++ b/drivers/net/can/usb/mcba_usb.c @@ -593,6 +593,7 @@ static void mcba_usb_read_bulk_callback(struct urb *urb) case -ENOENT: case -EPIPE: + case -EPROTO: case -ESHUTDOWN: return; From e7a80033fd226b71bd49602052ff8242355fd4a5 Mon Sep 17 00:00:00 2001 From: Martin Kelly Date: Tue, 5 Dec 2017 11:15:47 -0800 Subject: [PATCH 0187/1013] can: ems_usb: cancel urb on -EPIPE and -EPROTO commit bd352e1adfe0d02d3ea7c8e3fb19183dc317e679 upstream. In mcba_usb, we have observed that when you unplug the device, the driver will endlessly resubmit failing URBs, which can cause CPU stalls. This issue is fixed in mcba_usb by catching the codes seen on device disconnect (-EPIPE and -EPROTO). This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it in the same way. Signed-off-by: Martin Kelly Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/ems_usb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/can/usb/ems_usb.c b/drivers/net/can/usb/ems_usb.c index b3d02759c226..b00358297424 100644 --- a/drivers/net/can/usb/ems_usb.c +++ b/drivers/net/can/usb/ems_usb.c @@ -288,6 +288,8 @@ static void ems_usb_read_interrupt_callback(struct urb *urb) case -ECONNRESET: /* unlink */ case -ENOENT: + case -EPIPE: + case -EPROTO: case -ESHUTDOWN: return; From 421b93cb0f352c84964b701cf1db2e221c187d37 Mon Sep 17 00:00:00 2001 From: Martin Kelly Date: Tue, 5 Dec 2017 11:15:48 -0800 Subject: [PATCH 0188/1013] can: esd_usb2: cancel urb on -EPIPE and -EPROTO commit 7a31ced3de06e9878e4f9c3abe8f87d9344d8144 upstream. In mcba_usb, we have observed that when you unplug the device, the driver will endlessly resubmit failing URBs, which can cause CPU stalls. This issue is fixed in mcba_usb by catching the codes seen on device disconnect (-EPIPE and -EPROTO). This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it in the same way. Signed-off-by: Martin Kelly Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/esd_usb2.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/can/usb/esd_usb2.c b/drivers/net/can/usb/esd_usb2.c index 9fdb0f0bfa06..c6dcf93675c0 100644 --- a/drivers/net/can/usb/esd_usb2.c +++ b/drivers/net/can/usb/esd_usb2.c @@ -393,6 +393,8 @@ static void esd_usb2_read_bulk_callback(struct urb *urb) break; case -ENOENT: + case -EPIPE: + case -EPROTO: case -ESHUTDOWN: return; From 67464fbca35ccfbe2f5e957ff2a31334390b7947 Mon Sep 17 00:00:00 2001 From: Martin Kelly Date: Tue, 5 Dec 2017 11:15:50 -0800 Subject: [PATCH 0189/1013] can: usb_8dev: cancel urb on -EPIPE and -EPROTO commit 12147edc434c9e4c7c2f5fee2e5519b2e5ac34ce upstream. In mcba_usb, we have observed that when you unplug the device, the driver will endlessly resubmit failing URBs, which can cause CPU stalls. This issue is fixed in mcba_usb by catching the codes seen on device disconnect (-EPIPE and -EPROTO). This driver also resubmits in the case of -EPIPE and -EPROTO, so fix it in the same way. Signed-off-by: Martin Kelly Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/usb/usb_8dev.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/can/usb/usb_8dev.c b/drivers/net/can/usb/usb_8dev.c index d000cb62d6ae..27861c417c94 100644 --- a/drivers/net/can/usb/usb_8dev.c +++ b/drivers/net/can/usb/usb_8dev.c @@ -524,6 +524,8 @@ static void usb_8dev_read_bulk_callback(struct urb *urb) break; case -ENOENT: + case -EPIPE: + case -EPROTO: case -ESHUTDOWN: return; From 1f5d203906bc32ddb57d73ec6ea1694505ab7ffa Mon Sep 17 00:00:00 2001 From: Stephane Grosjean Date: Thu, 7 Dec 2017 16:13:43 +0100 Subject: [PATCH 0190/1013] can: peak/pcie_fd: fix potential bug in restarting tx queue commit 91785de6f94b58c3fb6664609e3682f011bd28d2 upstream. Don't rely on can_get_echo_skb() return value to wake the network tx queue up: can_get_echo_skb() returns 0 if the echo array slot was not occupied, but also when the DLC of the released echo frame was 0. Signed-off-by: Stephane Grosjean Signed-off-by: Marc Kleine-Budde Signed-off-by: Greg Kroah-Hartman --- drivers/net/can/peak_canfd/peak_canfd.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/drivers/net/can/peak_canfd/peak_canfd.c b/drivers/net/can/peak_canfd/peak_canfd.c index 85268be0c913..55513411a82e 100644 --- a/drivers/net/can/peak_canfd/peak_canfd.c +++ b/drivers/net/can/peak_canfd/peak_canfd.c @@ -258,21 +258,18 @@ static int pucan_handle_can_rx(struct peak_canfd_priv *priv, /* if this frame is an echo, */ if ((rx_msg_flags & PUCAN_MSG_LOOPED_BACK) && !(rx_msg_flags & PUCAN_MSG_SELF_RECEIVE)) { - int n; unsigned long flags; spin_lock_irqsave(&priv->echo_lock, flags); - n = can_get_echo_skb(priv->ndev, msg->client); + can_get_echo_skb(priv->ndev, msg->client); spin_unlock_irqrestore(&priv->echo_lock, flags); /* count bytes of the echo instead of skb */ stats->tx_bytes += cf_len; stats->tx_packets++; - if (n) { - /* restart tx queue only if a slot is free */ - netif_wake_queue(priv->ndev); - } + /* restart tx queue (a slot is free) */ + netif_wake_queue(priv->ndev); return 0; } From c5a5c47c40ce2478ad5b42b781b16c3e3849438f Mon Sep 17 00:00:00 2001 From: weiping zhang Date: Wed, 29 Nov 2017 09:23:01 +0800 Subject: [PATCH 0191/1013] virtio: release virtio index when fail to device_register commit e60ea67bb60459b95a50a156296041a13e0e380e upstream. index can be reused by other virtio device. Signed-off-by: weiping zhang Reviewed-by: Cornelia Huck Signed-off-by: Michael S. Tsirkin Signed-off-by: Greg Kroah-Hartman --- drivers/virtio/virtio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index 48230a5e12f2..bf7ff3934d7f 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -333,6 +333,8 @@ int register_virtio_device(struct virtio_device *dev) /* device_register() causes the bus infrastructure to look for a * matching driver. */ err = device_register(&dev->dev); + if (err) + ida_simple_remove(&virtio_index_ida, dev->index); out: if (err) virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); From c55befc42a9e65fed575dd7ce4e605e1164df79a Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 6 Sep 2017 14:56:50 +0200 Subject: [PATCH 0192/1013] iio: stm32: fix adc/trigger link error commit 6d745ee8b5e81f3a33791e3c854fbbfd6f3e585e upstream. The ADC driver can trigger on either the timer or the lptim trigger, but it only uses a Kconfig 'select' statement to ensure that the first of the two is present. When the lptim trigger is enabled as a loadable module, and the adc driver is built-in, we now get a link error: drivers/iio/adc/stm32-adc.o: In function `stm32_adc_get_trig_extsel': stm32-adc.c:(.text+0x4e0): undefined reference to `is_stm32_lptim_trigger' We could use a second 'select' statement and always have both trigger drivers enabled when the adc driver is, but it seems that the lptimer trigger was intentionally left optional, so it seems better to keep it that way. This adds a hack to use 'IS_REACHABLE()' rather than 'IS_ENABLED()', which avoids the link error, but instead leads to the lptimer trigger not being used in the broken configuration. I've added a runtime warning for this case to help users figure out what they did wrong if this should ever be done by accident. Fixes: f0b638a7f6db ("iio: adc: stm32: add support for lptimer triggers") Signed-off-by: Arnd Bergmann Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- include/linux/iio/timer/stm32-lptim-trigger.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/iio/timer/stm32-lptim-trigger.h b/include/linux/iio/timer/stm32-lptim-trigger.h index 34d59bfdce2d..464458d20b16 100644 --- a/include/linux/iio/timer/stm32-lptim-trigger.h +++ b/include/linux/iio/timer/stm32-lptim-trigger.h @@ -16,11 +16,14 @@ #define LPTIM2_OUT "lptim2_out" #define LPTIM3_OUT "lptim3_out" -#if IS_ENABLED(CONFIG_IIO_STM32_LPTIMER_TRIGGER) +#if IS_REACHABLE(CONFIG_IIO_STM32_LPTIMER_TRIGGER) bool is_stm32_lptim_trigger(struct iio_trigger *trig); #else static inline bool is_stm32_lptim_trigger(struct iio_trigger *trig) { +#if IS_ENABLED(CONFIG_IIO_STM32_LPTIMER_TRIGGER) + pr_warn_once("stm32 lptim_trigger not linked in\n"); +#endif return false; } #endif From aaeba39ddfe21321c8cfd5e3caeaef1b7153b184 Mon Sep 17 00:00:00 2001 From: Peter Meerwald-Stadler Date: Fri, 27 Oct 2017 21:45:31 +0200 Subject: [PATCH 0193/1013] iio: health: max30102: Temperature should be in milli Celsius commit ad44a9f804c1591ba2a2ec0ac8d916a515d2790c upstream. As per ABI temperature should be in milli Celsius after scaling, not Celsius Note on stable cc. This driver is breaking the standard IIO ABI. (JC) Signed-off-by: Peter Meerwald-Stadler Acked-by: Matt Ranostay Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/health/max30102.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/health/max30102.c b/drivers/iio/health/max30102.c index 839b875c29b9..9fb4bc73a6bc 100644 --- a/drivers/iio/health/max30102.c +++ b/drivers/iio/health/max30102.c @@ -371,7 +371,7 @@ static int max30102_read_raw(struct iio_dev *indio_dev, mutex_unlock(&indio_dev->mlock); break; case IIO_CHAN_INFO_SCALE: - *val = 1; /* 0.0625 */ + *val = 1000; /* 62.5 */ *val2 = 16; ret = IIO_VAL_FRACTIONAL; break; From 6f4a0a44ba5aafcb4884b863e0e27923c41ab2ec Mon Sep 17 00:00:00 2001 From: Pan Bian Date: Mon, 13 Nov 2017 00:01:20 +0800 Subject: [PATCH 0194/1013] iio: adc: cpcap: fix incorrect validation commit 81b039ec36a41a5451e1e36f05bb055eceab1dc8 upstream. Function platform_get_irq_byname() returns a negative error code on failure, and a zero or positive number on success. However, in function cpcap_adc_probe(), positive IRQ numbers are also taken as error cases. Use "if (ddata->irq < 0)" instead of "if (!ddata->irq)" to validate the return value of platform_get_irq_byname(). Signed-off-by: Pan Bian Fixes: 25ec249632d50 ("iio: adc: cpcap: Add minimal support for CPCAP PMIC ADC") Reviewed-by: Sebastian Reichel Acked-by: Tony Lindgren Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/cpcap-adc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/adc/cpcap-adc.c b/drivers/iio/adc/cpcap-adc.c index 6e419d5a7c14..f153e02686a0 100644 --- a/drivers/iio/adc/cpcap-adc.c +++ b/drivers/iio/adc/cpcap-adc.c @@ -1012,7 +1012,7 @@ static int cpcap_adc_probe(struct platform_device *pdev) platform_set_drvdata(pdev, indio_dev); ddata->irq = platform_get_irq_byname(pdev, "adcdone"); - if (!ddata->irq) + if (ddata->irq < 0) return -ENODEV; error = devm_request_threaded_irq(&pdev->dev, ddata->irq, NULL, From ac2d7838808c5bb68ea357f748cc4c1710555d6d Mon Sep 17 00:00:00 2001 From: Martin Blumenstingl Date: Tue, 31 Oct 2017 21:01:43 +0100 Subject: [PATCH 0195/1013] iio: adc: meson-saradc: fix the bit_idx of the adc_en clock commit 7a6b0420d2fe4ce59437bd318826fe468f0d71ae upstream. Meson8 and Meson8b SoCs use the the SAR ADC gate clock provided by the MESON_SAR_ADC_REG3 register within the SAR ADC register area. According to the datasheet (and the existing MESON_SAR_ADC_REG3_CLK_EN definition) the gate is on bit 30. The fls() function returns the last set bit, which is "bit index + 1" (fls(MESON_SAR_ADC_REG3_CLK_EN) returns 31). Fix this by switching to __ffs() which returns the first set bit, which is bit 30 in our case. This off by one error results in the ADC not being usable on devices where the bootloader did not enable the clock. Fixes: 3adbf3427330 ("iio: adc: add a driver for the SAR ADC found in Amlogic Meson SoCs") Signed-off-by: Martin Blumenstingl Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/meson_saradc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iio/adc/meson_saradc.c b/drivers/iio/adc/meson_saradc.c index 2e8dbb89c8c9..55611244c799 100644 --- a/drivers/iio/adc/meson_saradc.c +++ b/drivers/iio/adc/meson_saradc.c @@ -600,7 +600,7 @@ static int meson_sar_adc_clk_init(struct iio_dev *indio_dev, init.num_parents = 1; priv->clk_gate.reg = base + MESON_SAR_ADC_REG3; - priv->clk_gate.bit_idx = fls(MESON_SAR_ADC_REG3_CLK_EN); + priv->clk_gate.bit_idx = __ffs(MESON_SAR_ADC_REG3_CLK_EN); priv->clk_gate.hw.init = &init; priv->adc_clk = devm_clk_register(&indio_dev->dev, &priv->clk_gate.hw); From b6c6d01a2debe8df9541a9bd21cc60760d20f6a2 Mon Sep 17 00:00:00 2001 From: Martin Blumenstingl Date: Tue, 31 Oct 2017 21:01:44 +0100 Subject: [PATCH 0196/1013] iio: adc: meson-saradc: initialize the bandgap correctly on older SoCs commit d85eed9f576369bc90322659de96b7dbea1f9a57 upstream. Meson8 and Meson8b do not have the MESON_SAR_ADC_REG11 register. The bandgap setting for these SoCs is configured in the MESON_SAR_ADC_DELTA_10 register instead. Make the driver aware of this difference and use the correct bandgap register depending on the SoC. This has worked fine on Meson8 and Meson8b because the bootloader is already initializing the bandgap setting. Fixes: 6c76ed31cd05 ("iio: adc: meson-saradc: add Meson8b SoC compatibility") Signed-off-by: Martin Blumenstingl Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/meson_saradc.c | 33 ++++++++++++++++++++++++++------- 1 file changed, 26 insertions(+), 7 deletions(-) diff --git a/drivers/iio/adc/meson_saradc.c b/drivers/iio/adc/meson_saradc.c index 55611244c799..abe9df879b2a 100644 --- a/drivers/iio/adc/meson_saradc.c +++ b/drivers/iio/adc/meson_saradc.c @@ -221,6 +221,7 @@ enum meson_sar_adc_chan7_mux_sel { struct meson_sar_adc_data { bool has_bl30_integration; + u32 bandgap_reg; unsigned int resolution; const char *name; }; @@ -685,6 +686,20 @@ static int meson_sar_adc_init(struct iio_dev *indio_dev) return 0; } +static void meson_sar_adc_set_bandgap(struct iio_dev *indio_dev, bool on_off) +{ + struct meson_sar_adc_priv *priv = iio_priv(indio_dev); + u32 enable_mask; + + if (priv->data->bandgap_reg == MESON_SAR_ADC_REG11) + enable_mask = MESON_SAR_ADC_REG11_BANDGAP_EN; + else + enable_mask = MESON_SAR_ADC_DELTA_10_TS_VBG_EN; + + regmap_update_bits(priv->regmap, priv->data->bandgap_reg, enable_mask, + on_off ? enable_mask : 0); +} + static int meson_sar_adc_hw_enable(struct iio_dev *indio_dev) { struct meson_sar_adc_priv *priv = iio_priv(indio_dev); @@ -717,9 +732,9 @@ static int meson_sar_adc_hw_enable(struct iio_dev *indio_dev) regval = FIELD_PREP(MESON_SAR_ADC_REG0_FIFO_CNT_IRQ_MASK, 1); regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG0, MESON_SAR_ADC_REG0_FIFO_CNT_IRQ_MASK, regval); - regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG11, - MESON_SAR_ADC_REG11_BANDGAP_EN, - MESON_SAR_ADC_REG11_BANDGAP_EN); + + meson_sar_adc_set_bandgap(indio_dev, true); + regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG3, MESON_SAR_ADC_REG3_ADC_EN, MESON_SAR_ADC_REG3_ADC_EN); @@ -739,8 +754,7 @@ static int meson_sar_adc_hw_enable(struct iio_dev *indio_dev) err_adc_clk: regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG3, MESON_SAR_ADC_REG3_ADC_EN, 0); - regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG11, - MESON_SAR_ADC_REG11_BANDGAP_EN, 0); + meson_sar_adc_set_bandgap(indio_dev, false); clk_disable_unprepare(priv->sana_clk); err_sana_clk: clk_disable_unprepare(priv->core_clk); @@ -765,8 +779,8 @@ static int meson_sar_adc_hw_disable(struct iio_dev *indio_dev) regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG3, MESON_SAR_ADC_REG3_ADC_EN, 0); - regmap_update_bits(priv->regmap, MESON_SAR_ADC_REG11, - MESON_SAR_ADC_REG11_BANDGAP_EN, 0); + + meson_sar_adc_set_bandgap(indio_dev, false); clk_disable_unprepare(priv->sana_clk); clk_disable_unprepare(priv->core_clk); @@ -845,30 +859,35 @@ static const struct iio_info meson_sar_adc_iio_info = { static const struct meson_sar_adc_data meson_sar_adc_meson8_data = { .has_bl30_integration = false, + .bandgap_reg = MESON_SAR_ADC_DELTA_10, .resolution = 10, .name = "meson-meson8-saradc", }; static const struct meson_sar_adc_data meson_sar_adc_meson8b_data = { .has_bl30_integration = false, + .bandgap_reg = MESON_SAR_ADC_DELTA_10, .resolution = 10, .name = "meson-meson8b-saradc", }; static const struct meson_sar_adc_data meson_sar_adc_gxbb_data = { .has_bl30_integration = true, + .bandgap_reg = MESON_SAR_ADC_REG11, .resolution = 10, .name = "meson-gxbb-saradc", }; static const struct meson_sar_adc_data meson_sar_adc_gxl_data = { .has_bl30_integration = true, + .bandgap_reg = MESON_SAR_ADC_REG11, .resolution = 12, .name = "meson-gxl-saradc", }; static const struct meson_sar_adc_data meson_sar_adc_gxm_data = { .has_bl30_integration = true, + .bandgap_reg = MESON_SAR_ADC_REG11, .resolution = 12, .name = "meson-gxm-saradc", }; From b13ec02ab4a24192ad56b47e39a82269cc4825ca Mon Sep 17 00:00:00 2001 From: Martin Blumenstingl Date: Tue, 31 Oct 2017 21:01:45 +0100 Subject: [PATCH 0197/1013] iio: adc: meson-saradc: Meson8 and Meson8b do not have REG11 and REG13 commit 96748823c483c6eed8321f78bd128dd33f09c55c upstream. The Meson GXBB and newer SoCs have a few more registers than the older Meson8 and Meson8b SoCs. Use a separate regmap config to limit the older SoCs to the DELTA_10 register. Fixes: 6c76ed31cd05 ("iio: adc: meson-saradc: add Meson8b SoC compatibility") Signed-off-by: Martin Blumenstingl Signed-off-by: Jonathan Cameron Signed-off-by: Greg Kroah-Hartman --- drivers/iio/adc/meson_saradc.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/iio/adc/meson_saradc.c b/drivers/iio/adc/meson_saradc.c index abe9df879b2a..7dc7d297a0fc 100644 --- a/drivers/iio/adc/meson_saradc.c +++ b/drivers/iio/adc/meson_saradc.c @@ -224,6 +224,7 @@ struct meson_sar_adc_data { u32 bandgap_reg; unsigned int resolution; const char *name; + const struct regmap_config *regmap_config; }; struct meson_sar_adc_priv { @@ -243,13 +244,20 @@ struct meson_sar_adc_priv { int calibscale; }; -static const struct regmap_config meson_sar_adc_regmap_config = { +static const struct regmap_config meson_sar_adc_regmap_config_gxbb = { .reg_bits = 8, .val_bits = 32, .reg_stride = 4, .max_register = MESON_SAR_ADC_REG13, }; +static const struct regmap_config meson_sar_adc_regmap_config_meson8 = { + .reg_bits = 8, + .val_bits = 32, + .reg_stride = 4, + .max_register = MESON_SAR_ADC_DELTA_10, +}; + static unsigned int meson_sar_adc_get_fifo_count(struct iio_dev *indio_dev) { struct meson_sar_adc_priv *priv = iio_priv(indio_dev); @@ -860,6 +868,7 @@ static const struct iio_info meson_sar_adc_iio_info = { static const struct meson_sar_adc_data meson_sar_adc_meson8_data = { .has_bl30_integration = false, .bandgap_reg = MESON_SAR_ADC_DELTA_10, + .regmap_config = &meson_sar_adc_regmap_config_meson8, .resolution = 10, .name = "meson-meson8-saradc", }; @@ -867,6 +876,7 @@ static const struct meson_sar_adc_data meson_sar_adc_meson8_data = { static const struct meson_sar_adc_data meson_sar_adc_meson8b_data = { .has_bl30_integration = false, .bandgap_reg = MESON_SAR_ADC_DELTA_10, + .regmap_config = &meson_sar_adc_regmap_config_meson8, .resolution = 10, .name = "meson-meson8b-saradc", }; @@ -874,6 +884,7 @@ static const struct meson_sar_adc_data meson_sar_adc_meson8b_data = { static const struct meson_sar_adc_data meson_sar_adc_gxbb_data = { .has_bl30_integration = true, .bandgap_reg = MESON_SAR_ADC_REG11, + .regmap_config = &meson_sar_adc_regmap_config_gxbb, .resolution = 10, .name = "meson-gxbb-saradc", }; @@ -881,6 +892,7 @@ static const struct meson_sar_adc_data meson_sar_adc_gxbb_data = { static const struct meson_sar_adc_data meson_sar_adc_gxl_data = { .has_bl30_integration = true, .bandgap_reg = MESON_SAR_ADC_REG11, + .regmap_config = &meson_sar_adc_regmap_config_gxbb, .resolution = 12, .name = "meson-gxl-saradc", }; @@ -888,6 +900,7 @@ static const struct meson_sar_adc_data meson_sar_adc_gxl_data = { static const struct meson_sar_adc_data meson_sar_adc_gxm_data = { .has_bl30_integration = true, .bandgap_reg = MESON_SAR_ADC_REG11, + .regmap_config = &meson_sar_adc_regmap_config_gxbb, .resolution = 12, .name = "meson-gxm-saradc", }; @@ -965,7 +978,7 @@ static int meson_sar_adc_probe(struct platform_device *pdev) return ret; priv->regmap = devm_regmap_init_mmio(&pdev->dev, base, - &meson_sar_adc_regmap_config); + priv->data->regmap_config); if (IS_ERR(priv->regmap)) return PTR_ERR(priv->regmap); From 6c802ac42f970976d26241068e7202a84c9fd8b7 Mon Sep 17 00:00:00 2001 From: Gregory CLEMENT Date: Tue, 14 Nov 2017 17:51:50 +0100 Subject: [PATCH 0198/1013] pinctrl: armada-37xx: Fix direction_output() callback behavior commit 6702abb3bf2394f250af0ee04070227bb5dda788 upstream. The direction_output callback of the gpio_chip structure is supposed to set the output direction but also to set the value of the gpio. For the armada-37xx driver this callback acted as the gpio_set_direction callback for the pinctrl. This patch fixes the behavior of the direction_output callback by also applying the value received as parameter. Fixes: 5715092a458c ("pinctrl: armada-37xx: Add gpio support") Reported-by: Alexandre Belloni Signed-off-by: Gregory CLEMENT Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c index 71b944748304..c5fe7d4a9065 100644 --- a/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c +++ b/drivers/pinctrl/mvebu/pinctrl-armada-37xx.c @@ -408,12 +408,21 @@ static int armada_37xx_gpio_direction_output(struct gpio_chip *chip, { struct armada_37xx_pinctrl *info = gpiochip_get_data(chip); unsigned int reg = OUTPUT_EN; - unsigned int mask; + unsigned int mask, val, ret; armada_37xx_update_reg(®, offset); mask = BIT(offset); - return regmap_update_bits(info->regmap, reg, mask, mask); + ret = regmap_update_bits(info->regmap, reg, mask, mask); + + if (ret) + return ret; + + reg = OUTPUT_VAL; + val = value ? mask : 0; + regmap_update_bits(info->regmap, reg, mask, val); + + return 0; } static int armada_37xx_gpio_get(struct gpio_chip *chip, unsigned int offset) From 3f77a2c68ba8377f7f7ddcadff5ffda238cb83db Mon Sep 17 00:00:00 2001 From: "K. Y. Srinivasan" Date: Tue, 14 Nov 2017 06:53:33 -0700 Subject: [PATCH 0199/1013] Drivers: hv: vmbus: Fix a rescind issue commit 7fa32e5ec28b1609abc0b797b58267f725fc3964 upstream. The current rescind processing code will not correctly handle the case where the host immediately rescinds a channel that has been offerred. In this case, we could be blocked in the open call and since the channel is rescinded, the host will not respond and we could be blocked forever in the vmbus open call.i Fix this problem. Signed-off-by: K. Y. Srinivasan Signed-off-by: Greg Kroah-Hartman --- drivers/hv/channel.c | 10 ++++++++-- drivers/hv/channel_mgmt.c | 7 ++++--- include/linux/hyperv.h | 1 + 3 files changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index 894b67ac2cae..05964347008d 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -640,22 +640,28 @@ void vmbus_close(struct vmbus_channel *channel) */ return; } - mutex_lock(&vmbus_connection.channel_mutex); /* * Close all the sub-channels first and then close the * primary channel. */ list_for_each_safe(cur, tmp, &channel->sc_list) { cur_channel = list_entry(cur, struct vmbus_channel, sc_list); - vmbus_close_internal(cur_channel); if (cur_channel->rescind) { + wait_for_completion(&cur_channel->rescind_event); + mutex_lock(&vmbus_connection.channel_mutex); + vmbus_close_internal(cur_channel); hv_process_channel_removal( cur_channel->offermsg.child_relid); + } else { + mutex_lock(&vmbus_connection.channel_mutex); + vmbus_close_internal(cur_channel); } + mutex_unlock(&vmbus_connection.channel_mutex); } /* * Now close the primary. */ + mutex_lock(&vmbus_connection.channel_mutex); vmbus_close_internal(channel); mutex_unlock(&vmbus_connection.channel_mutex); } diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c index 379b0df123be..65c6d6bdce4c 100644 --- a/drivers/hv/channel_mgmt.c +++ b/drivers/hv/channel_mgmt.c @@ -333,6 +333,7 @@ static struct vmbus_channel *alloc_channel(void) return NULL; spin_lock_init(&channel->lock); + init_completion(&channel->rescind_event); INIT_LIST_HEAD(&channel->sc_list); INIT_LIST_HEAD(&channel->percpu_list); @@ -883,6 +884,7 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr) /* * Now wait for offer handling to complete. */ + vmbus_rescind_cleanup(channel); while (READ_ONCE(channel->probe_done) == false) { /* * We wait here until any channel offer is currently @@ -898,7 +900,6 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr) if (channel->device_obj) { if (channel->chn_rescind_callback) { channel->chn_rescind_callback(channel); - vmbus_rescind_cleanup(channel); return; } /* @@ -907,7 +908,6 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr) */ dev = get_device(&channel->device_obj->device); if (dev) { - vmbus_rescind_cleanup(channel); vmbus_device_unregister(channel->device_obj); put_device(dev); } @@ -921,13 +921,14 @@ static void vmbus_onoffer_rescind(struct vmbus_channel_message_header *hdr) * 2. Then close the primary channel. */ mutex_lock(&vmbus_connection.channel_mutex); - vmbus_rescind_cleanup(channel); if (channel->state == CHANNEL_OPEN_STATE) { /* * The channel is currently not open; * it is safe for us to cleanup the channel. */ hv_process_channel_removal(rescind->child_relid); + } else { + complete(&channel->rescind_event); } mutex_unlock(&vmbus_connection.channel_mutex); } diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 6431087816ba..ba74eaa8eadf 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -708,6 +708,7 @@ struct vmbus_channel { u8 monitor_bit; bool rescind; /* got rescind msg */ + struct completion rescind_event; u32 ringbuffer_gpadlhandle; From 070703364a970ac029f04c09d3e42a6d2287e01a Mon Sep 17 00:00:00 2001 From: Paul Meyer Date: Tue, 14 Nov 2017 13:06:47 -0700 Subject: [PATCH 0200/1013] hv: kvp: Avoid reading past allocated blocks from KVP file commit 297d6b6e56c2977fc504c61bbeeaa21296923f89 upstream. While reading in more than one block (50) of KVP records, the allocation goes per block, but the reads used the total number of allocated records (without resetting the pointer/stream). This causes the records buffer to overrun when the refresh reads more than one block over the previous capacity (e.g. reading more than 100 KVP records whereas the in-memory database was empty before). Fix this by reading the correct number of KVP records from file each time. Signed-off-by: Paul Meyer Signed-off-by: Long Li Signed-off-by: K. Y. Srinivasan Signed-off-by: Greg Kroah-Hartman --- tools/hv/hv_kvp_daemon.c | 70 ++++++++-------------------------------- 1 file changed, 14 insertions(+), 56 deletions(-) diff --git a/tools/hv/hv_kvp_daemon.c b/tools/hv/hv_kvp_daemon.c index eaa3bec273c8..4c99c57736ce 100644 --- a/tools/hv/hv_kvp_daemon.c +++ b/tools/hv/hv_kvp_daemon.c @@ -193,11 +193,14 @@ static void kvp_update_mem_state(int pool) for (;;) { readp = &record[records_read]; records_read += fread(readp, sizeof(struct kvp_record), - ENTRIES_PER_BLOCK * num_blocks, - filep); + ENTRIES_PER_BLOCK * num_blocks - records_read, + filep); if (ferror(filep)) { - syslog(LOG_ERR, "Failed to read file, pool: %d", pool); + syslog(LOG_ERR, + "Failed to read file, pool: %d; error: %d %s", + pool, errno, strerror(errno)); + kvp_release_lock(pool); exit(EXIT_FAILURE); } @@ -210,6 +213,7 @@ static void kvp_update_mem_state(int pool) if (record == NULL) { syslog(LOG_ERR, "malloc failed"); + kvp_release_lock(pool); exit(EXIT_FAILURE); } continue; @@ -224,15 +228,11 @@ static void kvp_update_mem_state(int pool) fclose(filep); kvp_release_lock(pool); } + static int kvp_file_init(void) { int fd; - FILE *filep; - size_t records_read; char *fname; - struct kvp_record *record; - struct kvp_record *readp; - int num_blocks; int i; int alloc_unit = sizeof(struct kvp_record) * ENTRIES_PER_BLOCK; @@ -246,61 +246,19 @@ static int kvp_file_init(void) for (i = 0; i < KVP_POOL_COUNT; i++) { fname = kvp_file_info[i].fname; - records_read = 0; - num_blocks = 1; sprintf(fname, "%s/.kvp_pool_%d", KVP_CONFIG_LOC, i); fd = open(fname, O_RDWR | O_CREAT | O_CLOEXEC, 0644 /* rw-r--r-- */); if (fd == -1) return 1; - - filep = fopen(fname, "re"); - if (!filep) { - close(fd); - return 1; - } - - record = malloc(alloc_unit * num_blocks); - if (record == NULL) { - fclose(filep); - close(fd); - return 1; - } - for (;;) { - readp = &record[records_read]; - records_read += fread(readp, sizeof(struct kvp_record), - ENTRIES_PER_BLOCK, - filep); - - if (ferror(filep)) { - syslog(LOG_ERR, "Failed to read file, pool: %d", - i); - exit(EXIT_FAILURE); - } - - if (!feof(filep)) { - /* - * We have more data to read. - */ - num_blocks++; - record = realloc(record, alloc_unit * - num_blocks); - if (record == NULL) { - fclose(filep); - close(fd); - return 1; - } - continue; - } - break; - } kvp_file_info[i].fd = fd; - kvp_file_info[i].num_blocks = num_blocks; - kvp_file_info[i].records = record; - kvp_file_info[i].num_records = records_read; - fclose(filep); - + kvp_file_info[i].num_blocks = 1; + kvp_file_info[i].records = malloc(alloc_unit); + if (kvp_file_info[i].records == NULL) + return 1; + kvp_file_info[i].num_records = 0; + kvp_update_mem_state(i); } return 0; From 7b167dc4b875aa9d3bddf5c29b34ccfe8b87174e Mon Sep 17 00:00:00 2001 From: "Robin H. Johnson" Date: Thu, 16 Nov 2017 14:36:12 -0800 Subject: [PATCH 0201/1013] firmware: cleanup FIRMWARE_IN_KERNEL message commit 0946b2fb38fdb6585a5ac3ca84ac73924f645952 upstream. The help for FIRMWARE_IN_KERNEL still references the firmware_install command that was recently removed by commit 5620a0d1aacd ("firmware: delete in-kernel firmware"). Clean up the message to direct the user to their distribution's linux-firmware package, and remove any reference to firmware being included in the kernel source tree. Fixes: 5620a0d1aacd ("firmware: delete in-kernel firmware"). Cc: Masahiro Yamada Cc: David Woodhouse Signed-off-by: Robin H. Johnson Signed-off-by: Greg Kroah-Hartman --- drivers/base/Kconfig | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig index 2f6614c9a229..bdc87907d6a1 100644 --- a/drivers/base/Kconfig +++ b/drivers/base/Kconfig @@ -91,22 +91,23 @@ config FIRMWARE_IN_KERNEL depends on FW_LOADER default y help - The kernel source tree includes a number of firmware 'blobs' - that are used by various drivers. The recommended way to - use these is to run "make firmware_install", which, after - converting ihex files to binary, copies all of the needed - binary files in firmware/ to /lib/firmware/ on your system so - that they can be loaded by userspace helpers on request. + Various drivers in the kernel source tree may require firmware, + which is generally available in your distribution's linux-firmware + package. + + The linux-firmware package should install firmware into + /lib/firmware/ on your system, so they can be loaded by userspace + helpers on request. Enabling this option will build each required firmware blob - into the kernel directly, where request_firmware() will find - them without having to call out to userspace. This may be - useful if your root file system requires a device that uses - such firmware and do not wish to use an initrd. + specified by EXTRA_FIRMWARE into the kernel directly, where + request_firmware() will find them without having to call out to + userspace. This may be useful if your root file system requires a + device that uses such firmware and you do not wish to use an + initrd. This single option controls the inclusion of firmware for - every driver that uses request_firmware() and ships its - firmware in the kernel source tree, which avoids a + every driver that uses request_firmware(), which avoids a proliferation of 'Include firmware for xxx device' options. Say 'N' and let firmware be loaded from userspace. From cdc5340c054b37655ea1f8fef99d3e4ab39855d8 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 15 Nov 2017 13:00:43 -0800 Subject: [PATCH 0202/1013] firmware: vpd: Destroy vpd sections in remove function commit 811d7e0215fb738fb9a9f0bcb1276516ad161ed1 upstream. vpd sections are initialized during probe and thus should be destroyed in the remove function. Fixes: 049a59db34eb ("firmware: Google VPD sysfs driver") Signed-off-by: Guenter Roeck Reviewed-by: Dmitry Torokhov Tested-by: Randy Dunlap Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman --- drivers/firmware/google/vpd.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/firmware/google/vpd.c b/drivers/firmware/google/vpd.c index 35e553b3b190..84217172297b 100644 --- a/drivers/firmware/google/vpd.c +++ b/drivers/firmware/google/vpd.c @@ -298,8 +298,17 @@ static int vpd_probe(struct platform_device *pdev) return vpd_sections_init(entry.cbmem_addr); } +static int vpd_remove(struct platform_device *pdev) +{ + vpd_section_destroy(&ro_vpd); + vpd_section_destroy(&rw_vpd); + + return 0; +} + static struct platform_driver vpd_driver = { .probe = vpd_probe, + .remove = vpd_remove, .driver = { .name = "vpd", }, @@ -324,8 +333,6 @@ static int __init vpd_platform_init(void) static void __exit vpd_platform_exit(void) { - vpd_section_destroy(&ro_vpd); - vpd_section_destroy(&rw_vpd); kobject_put(vpd_kobj); } From 03bb8e9a18190ee12f5250ff4e97ac035fb2abca Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 15 Nov 2017 13:00:44 -0800 Subject: [PATCH 0203/1013] firmware: vpd: Tie firmware kobject to device lifetime commit e4b28b3c3a405b251fa25db58abe1512814a680a upstream. It doesn't make sense to have /sys/firmware/vpd if the device is not instantiated, so tie its lifetime to the device. Fixes: 049a59db34eb ("firmware: Google VPD sysfs driver") Signed-off-by: Guenter Roeck Reviewed-by: Dmitry Torokhov Tested-by: Randy Dunlap Signed-off-by: Greg Kroah-Hartman --- drivers/firmware/google/vpd.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/drivers/firmware/google/vpd.c b/drivers/firmware/google/vpd.c index 84217172297b..942e358efa60 100644 --- a/drivers/firmware/google/vpd.c +++ b/drivers/firmware/google/vpd.c @@ -295,7 +295,17 @@ static int vpd_probe(struct platform_device *pdev) if (ret) return ret; - return vpd_sections_init(entry.cbmem_addr); + vpd_kobj = kobject_create_and_add("vpd", firmware_kobj); + if (!vpd_kobj) + return -ENOMEM; + + ret = vpd_sections_init(entry.cbmem_addr); + if (ret) { + kobject_put(vpd_kobj); + return ret; + } + + return 0; } static int vpd_remove(struct platform_device *pdev) @@ -303,6 +313,8 @@ static int vpd_remove(struct platform_device *pdev) vpd_section_destroy(&ro_vpd); vpd_section_destroy(&rw_vpd); + kobject_put(vpd_kobj); + return 0; } @@ -322,10 +334,6 @@ static int __init vpd_platform_init(void) if (IS_ERR(pdev)) return PTR_ERR(pdev); - vpd_kobj = kobject_create_and_add("vpd", firmware_kobj); - if (!vpd_kobj) - return -ENOMEM; - platform_driver_register(&vpd_driver); return 0; @@ -333,7 +341,6 @@ static int __init vpd_platform_init(void) static void __exit vpd_platform_exit(void) { - kobject_put(vpd_kobj); } module_init(vpd_platform_init); From ff8307709c0aff44bb1919e7d98ffa245af7cf6a Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 15 Nov 2017 13:00:45 -0800 Subject: [PATCH 0204/1013] firmware: vpd: Fix platform driver and device registration/unregistration commit 0631fb8b027f5968c2f5031f0b3ff7be3e4bebcc upstream. The driver exit function needs to unregister both platform device and driver. Also, during registration, register driver first and perform error checks. Fixes: 049a59db34eb ("firmware: Google VPD sysfs driver") Signed-off-by: Guenter Roeck Tested-by: Randy Dunlap Reviewed-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/firmware/google/vpd.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/drivers/firmware/google/vpd.c b/drivers/firmware/google/vpd.c index 942e358efa60..e4b40f2b4627 100644 --- a/drivers/firmware/google/vpd.c +++ b/drivers/firmware/google/vpd.c @@ -326,21 +326,29 @@ static struct platform_driver vpd_driver = { }, }; +static struct platform_device *vpd_pdev; + static int __init vpd_platform_init(void) { - struct platform_device *pdev; + int ret; - pdev = platform_device_register_simple("vpd", -1, NULL, 0); - if (IS_ERR(pdev)) - return PTR_ERR(pdev); + ret = platform_driver_register(&vpd_driver); + if (ret) + return ret; - platform_driver_register(&vpd_driver); + vpd_pdev = platform_device_register_simple("vpd", -1, NULL, 0); + if (IS_ERR(vpd_pdev)) { + platform_driver_unregister(&vpd_driver); + return PTR_ERR(vpd_pdev); + } return 0; } static void __exit vpd_platform_exit(void) { + platform_device_unregister(vpd_pdev); + platform_driver_unregister(&vpd_driver); } module_init(vpd_platform_init); From 400391d711b1f783c85679b70f71c5fea56c0341 Mon Sep 17 00:00:00 2001 From: William Breathitt Gray Date: Wed, 8 Nov 2017 10:23:11 -0500 Subject: [PATCH 0205/1013] isa: Prevent NULL dereference in isa_bus driver callbacks commit 5a244727f428a06634f22bb890e78024ab0c89f3 upstream. The isa_driver structure for an isa_bus device is stored in the device platform_data member of the respective device structure. This platform_data member may be reset to NULL if isa_driver match callback for the device fails, indicating a device unsupported by the ISA driver. This patch fixes a possible NULL pointer dereference if one of the isa_driver callbacks to attempted for an unsupported device. This error should not occur in practice since ISA devices are typically manually configured and loaded by the users, but we may as well prevent this error from popping up for the 0day testers. Fixes: a5117ba7da37 ("[PATCH] Driver model: add ISA bus") Signed-off-by: William Breathitt Gray Acked-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- drivers/base/isa.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/base/isa.c b/drivers/base/isa.c index cd6ccdcf9df0..372d10af2600 100644 --- a/drivers/base/isa.c +++ b/drivers/base/isa.c @@ -39,7 +39,7 @@ static int isa_bus_probe(struct device *dev) { struct isa_driver *isa_driver = dev->platform_data; - if (isa_driver->probe) + if (isa_driver && isa_driver->probe) return isa_driver->probe(dev, to_isa_dev(dev)->id); return 0; @@ -49,7 +49,7 @@ static int isa_bus_remove(struct device *dev) { struct isa_driver *isa_driver = dev->platform_data; - if (isa_driver->remove) + if (isa_driver && isa_driver->remove) return isa_driver->remove(dev, to_isa_dev(dev)->id); return 0; @@ -59,7 +59,7 @@ static void isa_bus_shutdown(struct device *dev) { struct isa_driver *isa_driver = dev->platform_data; - if (isa_driver->shutdown) + if (isa_driver && isa_driver->shutdown) isa_driver->shutdown(dev, to_isa_dev(dev)->id); } @@ -67,7 +67,7 @@ static int isa_bus_suspend(struct device *dev, pm_message_t state) { struct isa_driver *isa_driver = dev->platform_data; - if (isa_driver->suspend) + if (isa_driver && isa_driver->suspend) return isa_driver->suspend(dev, to_isa_dev(dev)->id, state); return 0; @@ -77,7 +77,7 @@ static int isa_bus_resume(struct device *dev) { struct isa_driver *isa_driver = dev->platform_data; - if (isa_driver->resume) + if (isa_driver && isa_driver->resume) return isa_driver->resume(dev, to_isa_dev(dev)->id); return 0; From bc2b3046048bc193def020ef9d871f4edb3f47ba Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 21 Nov 2017 14:23:37 +0100 Subject: [PATCH 0206/1013] scsi: dma-mapping: always provide dma_get_cache_alignment commit 860dd4424f344400b491b212ee4acb3a358ba9d9 upstream. Provide the dummy version of dma_get_cache_alignment that always returns 1 even if CONFIG_HAS_DMA is not set, so that drivers and subsystems can use it without ifdefs. Signed-off-by: Christoph Hellwig Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- include/linux/dma-mapping.h | 2 -- 1 file changed, 2 deletions(-) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 7653ea66874d..46930f82a988 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -697,7 +697,6 @@ static inline void *dma_zalloc_coherent(struct device *dev, size_t size, return ret; } -#ifdef CONFIG_HAS_DMA static inline int dma_get_cache_alignment(void) { #ifdef ARCH_DMA_MINALIGN @@ -705,7 +704,6 @@ static inline int dma_get_cache_alignment(void) #endif return 1; } -#endif /* flags for the coherent memory api */ #define DMA_MEMORY_EXCLUSIVE 0x01 From 21747052c4b31f453e9fbbf4ee46f221c58cf2cd Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Tue, 21 Nov 2017 14:23:38 +0100 Subject: [PATCH 0207/1013] scsi: use dma_get_cache_alignment() as minimum DMA alignment commit 90addc6b3c9cda0146fbd62a08e234c2b224a80c upstream. In non-coherent DMA mode, kernel uses cache flushing operations to maintain I/O coherency, so scsi's block queue should be aligned to the value returned by dma_get_cache_alignment(). Otherwise, If a DMA buffer and a kernel structure share a same cache line, and if the kernel structure has dirty data, cache_invalidate (no writeback) will cause data corruption. Signed-off-by: Huacai Chen [hch: rebased and updated the comment and changelog] Signed-off-by: Christoph Hellwig Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/scsi_lib.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index bcc1694cebcd..635cfa1f2ace 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -2126,11 +2126,13 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct request_queue *q) q->limits.cluster = 0; /* - * set a reasonable default alignment on word boundaries: the - * host and device may alter it using - * blk_queue_update_dma_alignment() later. + * Set a reasonable default alignment: The larger of 32-byte (dword), + * which is a common minimum for HBAs, and the minimum DMA alignment, + * which is set by the platform. + * + * Devices that require a bigger alignment can increase it later. */ - blk_queue_dma_alignment(q, 0x03); + blk_queue_dma_alignment(q, max(4, dma_get_cache_alignment()) - 1); } EXPORT_SYMBOL_GPL(__scsi_init_queue); From a326fc91abb834c832166cff9f4362f98ff7f95d Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Tue, 21 Nov 2017 14:23:39 +0100 Subject: [PATCH 0208/1013] scsi: libsas: align sata_device's rps_resp on a cacheline commit c2e8fbf908afd81ad502b567a6639598f92c9b9d upstream. The rps_resp buffer in ata_device is a DMA target, but it isn't explicitly cacheline aligned. Due to this, adjacent fields can be overwritten with stale data from memory on non-coherent architectures. As a result, the kernel is sometimes unable to communicate with an SATA device behind a SAS expander. Fix this by ensuring that the rps_resp buffer is cacheline aligned. This issue is similar to that fixed by Commit 84bda12af31f93 ("libata: align ap->sector_buf") and Commit 4ee34ea3a12396f35b26 ("libata: Align ata_device's id on a cacheline"). Signed-off-by: Huacai Chen Signed-off-by: Christoph Hellwig Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- include/scsi/libsas.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/scsi/libsas.h b/include/scsi/libsas.h index 6c0dc6155ee7..a966d281dedc 100644 --- a/include/scsi/libsas.h +++ b/include/scsi/libsas.h @@ -165,11 +165,11 @@ struct expander_device { struct sata_device { unsigned int class; - struct smp_resp rps_resp; /* report_phy_sata_resp */ u8 port_no; /* port number, if this is a PM (Port) */ struct ata_port *ap; struct ata_host ata_host; + struct smp_resp rps_resp ____cacheline_aligned; /* report_phy_sata_resp */ u8 fis[ATA_RESP_FIS_SIZE]; }; From 985ce9ee25cca9d8841b0cc31ced92dbd1f9bf99 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 6 Dec 2017 09:50:08 +0000 Subject: [PATCH 0209/1013] efi: Move some sysfs files to be read-only by root commit af97a77bc01ce49a466f9d4c0125479e2e2230b6 upstream. Thanks to the scripts/leaking_addresses.pl script, it was found that some EFI values should not be readable by non-root users. So make them root-only, and to do that, add a __ATTR_RO_MODE() macro to make this easier, and use it in other places at the same time. Reported-by: Linus Torvalds Tested-by: Dave Young Signed-off-by: Greg Kroah-Hartman Signed-off-by: Ard Biesheuvel Cc: H. Peter Anvin Cc: Matt Fleming Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20171206095010.24170-2-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- drivers/firmware/efi/efi.c | 3 +-- drivers/firmware/efi/esrt.c | 15 ++++++--------- drivers/firmware/efi/runtime-map.c | 10 +++++----- include/linux/sysfs.h | 6 ++++++ 4 files changed, 18 insertions(+), 16 deletions(-) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index f70febf680c3..c3eefa126e3b 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -143,8 +143,7 @@ static ssize_t systab_show(struct kobject *kobj, return str - buf; } -static struct kobj_attribute efi_attr_systab = - __ATTR(systab, 0400, systab_show, NULL); +static struct kobj_attribute efi_attr_systab = __ATTR_RO_MODE(systab, 0400); #define EFI_FIELD(var) efi.var diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c index bd7ed3c1148a..7aae2483fcb9 100644 --- a/drivers/firmware/efi/esrt.c +++ b/drivers/firmware/efi/esrt.c @@ -106,7 +106,7 @@ static const struct sysfs_ops esre_attr_ops = { }; /* Generic ESRT Entry ("ESRE") support. */ -static ssize_t esre_fw_class_show(struct esre_entry *entry, char *buf) +static ssize_t fw_class_show(struct esre_entry *entry, char *buf) { char *str = buf; @@ -117,18 +117,16 @@ static ssize_t esre_fw_class_show(struct esre_entry *entry, char *buf) return str - buf; } -static struct esre_attribute esre_fw_class = __ATTR(fw_class, 0400, - esre_fw_class_show, NULL); +static struct esre_attribute esre_fw_class = __ATTR_RO_MODE(fw_class, 0400); #define esre_attr_decl(name, size, fmt) \ -static ssize_t esre_##name##_show(struct esre_entry *entry, char *buf) \ +static ssize_t name##_show(struct esre_entry *entry, char *buf) \ { \ return sprintf(buf, fmt "\n", \ le##size##_to_cpu(entry->esre.esre1->name)); \ } \ \ -static struct esre_attribute esre_##name = __ATTR(name, 0400, \ - esre_##name##_show, NULL) +static struct esre_attribute esre_##name = __ATTR_RO_MODE(name, 0400) esre_attr_decl(fw_type, 32, "%u"); esre_attr_decl(fw_version, 32, "%u"); @@ -193,14 +191,13 @@ static int esre_create_sysfs_entry(void *esre, int entry_num) /* support for displaying ESRT fields at the top level */ #define esrt_attr_decl(name, size, fmt) \ -static ssize_t esrt_##name##_show(struct kobject *kobj, \ +static ssize_t name##_show(struct kobject *kobj, \ struct kobj_attribute *attr, char *buf)\ { \ return sprintf(buf, fmt "\n", le##size##_to_cpu(esrt->name)); \ } \ \ -static struct kobj_attribute esrt_##name = __ATTR(name, 0400, \ - esrt_##name##_show, NULL) +static struct kobj_attribute esrt_##name = __ATTR_RO_MODE(name, 0400) esrt_attr_decl(fw_resource_count, 32, "%u"); esrt_attr_decl(fw_resource_count_max, 32, "%u"); diff --git a/drivers/firmware/efi/runtime-map.c b/drivers/firmware/efi/runtime-map.c index 8e64b77aeac9..f377609ff141 100644 --- a/drivers/firmware/efi/runtime-map.c +++ b/drivers/firmware/efi/runtime-map.c @@ -63,11 +63,11 @@ static ssize_t map_attr_show(struct kobject *kobj, struct attribute *attr, return map_attr->show(entry, buf); } -static struct map_attribute map_type_attr = __ATTR_RO(type); -static struct map_attribute map_phys_addr_attr = __ATTR_RO(phys_addr); -static struct map_attribute map_virt_addr_attr = __ATTR_RO(virt_addr); -static struct map_attribute map_num_pages_attr = __ATTR_RO(num_pages); -static struct map_attribute map_attribute_attr = __ATTR_RO(attribute); +static struct map_attribute map_type_attr = __ATTR_RO_MODE(type, 0400); +static struct map_attribute map_phys_addr_attr = __ATTR_RO_MODE(phys_addr, 0400); +static struct map_attribute map_virt_addr_attr = __ATTR_RO_MODE(virt_addr, 0400); +static struct map_attribute map_num_pages_attr = __ATTR_RO_MODE(num_pages, 0400); +static struct map_attribute map_attribute_attr = __ATTR_RO_MODE(attribute, 0400); /* * These are default attributes that are added for every memmap entry. diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h index e32dfe098e82..40839c02d28c 100644 --- a/include/linux/sysfs.h +++ b/include/linux/sysfs.h @@ -117,6 +117,12 @@ struct attribute_group { .show = _name##_show, \ } +#define __ATTR_RO_MODE(_name, _mode) { \ + .attr = { .name = __stringify(_name), \ + .mode = VERIFY_OCTAL_PERMISSIONS(_mode) }, \ + .show = _name##_show, \ +} + #define __ATTR_WO(_name) { \ .attr = { .name = __stringify(_name), .mode = S_IWUSR }, \ .store = _name##_store, \ From f4d901739324cc1c400f322767a9df6de7d838ce Mon Sep 17 00:00:00 2001 From: Pan Bian Date: Wed, 6 Dec 2017 09:50:09 +0000 Subject: [PATCH 0210/1013] efi/esrt: Use memunmap() instead of kfree() to free the remapping commit 89c5a2d34bda58319e3075e8e7dd727ea25a435c upstream. The remapping result of memremap() should be freed with memunmap(), not kfree(). Signed-off-by: Pan Bian Signed-off-by: Ard Biesheuvel Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Matt Fleming Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20171206095010.24170-3-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- drivers/firmware/efi/esrt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firmware/efi/esrt.c b/drivers/firmware/efi/esrt.c index 7aae2483fcb9..c47e0c6ec00f 100644 --- a/drivers/firmware/efi/esrt.c +++ b/drivers/firmware/efi/esrt.c @@ -428,7 +428,7 @@ static int __init esrt_sysfs_init(void) err_remove_esrt: kobject_put(esrt_kobj); err: - kfree(esrt); + memunmap(esrt); esrt = NULL; return error; } From 2c4c01d13f74ef37299b4e23a95bb0691298370a Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 8 Dec 2017 15:13:27 +0000 Subject: [PATCH 0211/1013] ASN.1: fix out-of-bounds read when parsing indefinite length item commit e0058f3a874ebb48b25be7ff79bc3b4e59929f90 upstream. In asn1_ber_decoder(), indefinitely-sized ASN.1 items were being passed to the action functions before their lengths had been computed, using the bogus length of 0x80 (ASN1_INDEFINITE_LENGTH). This resulted in reading data past the end of the input buffer, when given a specially crafted message. Fix it by rearranging the code so that the indefinite length is resolved before the action is called. This bug was originally found by fuzzing the X.509 parser in userspace using libFuzzer from the LLVM project. KASAN report (cleaned up slightly): BUG: KASAN: slab-out-of-bounds in memcpy ./include/linux/string.h:341 [inline] BUG: KASAN: slab-out-of-bounds in x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366 Read of size 128 at addr ffff880035dd9eaf by task keyctl/195 CPU: 1 PID: 195 Comm: keyctl Not tainted 4.14.0-09238-g1d3b78bbc6e9 #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0xd1/0x175 lib/dump_stack.c:53 print_address_description+0x78/0x260 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x23f/0x350 mm/kasan/report.c:409 memcpy+0x1f/0x50 mm/kasan/kasan.c:302 memcpy ./include/linux/string.h:341 [inline] x509_fabricate_name.constprop.1+0x1a4/0x940 crypto/asymmetric_keys/x509_cert_parser.c:366 asn1_ber_decoder+0xb4a/0x1fd0 lib/asn1_decoder.c:447 x509_cert_parse+0x1c7/0x620 crypto/asymmetric_keys/x509_cert_parser.c:89 x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174 asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388 key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0x96 Allocated by task 195: __do_kmalloc_node mm/slab.c:3675 [inline] __kmalloc_node+0x47/0x60 mm/slab.c:3682 kvmalloc ./include/linux/mm.h:540 [inline] SYSC_add_key security/keys/keyctl.c:104 [inline] SyS_add_key+0x19e/0x290 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0x96 Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder") Reported-by: Alexander Potapenko Signed-off-by: Eric Biggers Signed-off-by: David Howells Signed-off-by: Greg Kroah-Hartman --- lib/asn1_decoder.c | 47 +++++++++++++++++++++++++--------------------- 1 file changed, 26 insertions(+), 21 deletions(-) diff --git a/lib/asn1_decoder.c b/lib/asn1_decoder.c index 1ef0cec38d78..d77cdfc4b554 100644 --- a/lib/asn1_decoder.c +++ b/lib/asn1_decoder.c @@ -313,42 +313,47 @@ int asn1_ber_decoder(const struct asn1_decoder *decoder, /* Decide how to handle the operation */ switch (op) { - case ASN1_OP_MATCH_ANY_ACT: - case ASN1_OP_MATCH_ANY_ACT_OR_SKIP: - case ASN1_OP_COND_MATCH_ANY_ACT: - case ASN1_OP_COND_MATCH_ANY_ACT_OR_SKIP: - ret = actions[machine[pc + 1]](context, hdr, tag, data + dp, len); - if (ret < 0) - return ret; - goto skip_data; - - case ASN1_OP_MATCH_ACT: - case ASN1_OP_MATCH_ACT_OR_SKIP: - case ASN1_OP_COND_MATCH_ACT_OR_SKIP: - ret = actions[machine[pc + 2]](context, hdr, tag, data + dp, len); - if (ret < 0) - return ret; - goto skip_data; - case ASN1_OP_MATCH: case ASN1_OP_MATCH_OR_SKIP: + case ASN1_OP_MATCH_ACT: + case ASN1_OP_MATCH_ACT_OR_SKIP: case ASN1_OP_MATCH_ANY: case ASN1_OP_MATCH_ANY_OR_SKIP: + case ASN1_OP_MATCH_ANY_ACT: + case ASN1_OP_MATCH_ANY_ACT_OR_SKIP: case ASN1_OP_COND_MATCH_OR_SKIP: + case ASN1_OP_COND_MATCH_ACT_OR_SKIP: case ASN1_OP_COND_MATCH_ANY: case ASN1_OP_COND_MATCH_ANY_OR_SKIP: - skip_data: + case ASN1_OP_COND_MATCH_ANY_ACT: + case ASN1_OP_COND_MATCH_ANY_ACT_OR_SKIP: + if (!(flags & FLAG_CONS)) { if (flags & FLAG_INDEFINITE_LENGTH) { + size_t tmp = dp; + ret = asn1_find_indefinite_length( - data, datalen, &dp, &len, &errmsg); + data, datalen, &tmp, &len, &errmsg); if (ret < 0) goto error; - } else { - dp += len; } pr_debug("- LEAF: %zu\n", len); } + + if (op & ASN1_OP_MATCH__ACT) { + unsigned char act; + + if (op & ASN1_OP_MATCH__ANY) + act = machine[pc + 1]; + else + act = machine[pc + 2]; + ret = actions[act](context, hdr, tag, data + dp, len); + if (ret < 0) + return ret; + } + + if (!(flags & FLAG_CONS)) + dp += len; pc += asn1_op_lengths[op]; goto next_op; From 4c69b34050996d1a1167242f625973dfb9f54271 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 8 Dec 2017 15:13:27 +0000 Subject: [PATCH 0212/1013] ASN.1: check for error from ASN1_OP_END__ACT actions commit 81a7be2cd69b412ab6aeacfe5ebf1bb6e5bce955 upstream. asn1_ber_decoder() was ignoring errors from actions associated with the opcodes ASN1_OP_END_SEQ_ACT, ASN1_OP_END_SET_ACT, ASN1_OP_END_SEQ_OF_ACT, and ASN1_OP_END_SET_OF_ACT. In practice, this meant the pkcs7_note_signed_info() action (since that was the only user of those opcodes). Fix it by checking for the error, just like the decoder does for actions associated with the other opcodes. This bug allowed users to leak slab memory by repeatedly trying to add a specially crafted "pkcs7_test" key (requires CONFIG_PKCS7_TEST_KEY). In theory, this bug could also be used to bypass module signature verification, by providing a PKCS#7 message that is misparsed such that a signature's ->authattrs do not contain its ->msgdigest. But it doesn't seem practical in normal cases, due to restrictions on the format of the ->authattrs. Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder") Signed-off-by: Eric Biggers Signed-off-by: David Howells Reviewed-by: James Morris Signed-off-by: Greg Kroah-Hartman --- lib/asn1_decoder.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/asn1_decoder.c b/lib/asn1_decoder.c index d77cdfc4b554..dc14beae2c9a 100644 --- a/lib/asn1_decoder.c +++ b/lib/asn1_decoder.c @@ -439,6 +439,8 @@ int asn1_ber_decoder(const struct asn1_decoder *decoder, else act = machine[pc + 1]; ret = actions[act](context, hdr, 0, data + tdp, len); + if (ret < 0) + return ret; } pc += asn1_op_lengths[op]; goto next_op; From 69d5894ce0a67d37f900d2597fc0b2b8cef6c863 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 8 Dec 2017 15:13:27 +0000 Subject: [PATCH 0213/1013] KEYS: add missing permission check for request_key() destination commit 4dca6ea1d9432052afb06baf2e3ae78188a4410b upstream. When the request_key() syscall is not passed a destination keyring, it links the requested key (if constructed) into the "default" request-key keyring. This should require Write permission to the keyring. However, there is actually no permission check. This can be abused to add keys to any keyring to which only Search permission is granted. This is because Search permission allows joining the keyring. keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_SESSION_KEYRING) then will set the default request-key keyring to the session keyring. Then, request_key() can be used to add keys to the keyring. Both negatively and positively instantiated keys can be added using this method. Adding negative keys is trivial. Adding a positive key is a bit trickier. It requires that either /sbin/request-key positively instantiates the key, or that another thread adds the key to the process keyring at just the right time, such that request_key() misses it initially but then finds it in construct_alloc_key(). Fix this bug by checking for Write permission to the keyring in construct_get_dest_keyring() when the default keyring is being used. We don't do the permission check for non-default keyrings because that was already done by the earlier call to lookup_user_key(). Also, request_key_and_link() is currently passed a 'struct key *' rather than a key_ref_t, so the "possessed" bit is unavailable. We also don't do the permission check for the "requestor keyring", to continue to support the use case described by commit 8bbf4976b59f ("KEYS: Alter use of key instantiation link-to-keyring argument") where /sbin/request-key recursively calls request_key() to add keys to the original requestor's destination keyring. (I don't know of any users who actually do that, though...) Fixes: 3e30148c3d52 ("[PATCH] Keys: Make request-key create an authorisation key") Signed-off-by: Eric Biggers Signed-off-by: David Howells Signed-off-by: Greg Kroah-Hartman --- security/keys/request_key.c | 46 +++++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 9 deletions(-) diff --git a/security/keys/request_key.c b/security/keys/request_key.c index e8036cd0ad54..7dc741382154 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -251,11 +251,12 @@ static int construct_key(struct key *key, const void *callout_info, * The keyring selected is returned with an extra reference upon it which the * caller must release. */ -static void construct_get_dest_keyring(struct key **_dest_keyring) +static int construct_get_dest_keyring(struct key **_dest_keyring) { struct request_key_auth *rka; const struct cred *cred = current_cred(); struct key *dest_keyring = *_dest_keyring, *authkey; + int ret; kenter("%p", dest_keyring); @@ -264,6 +265,8 @@ static void construct_get_dest_keyring(struct key **_dest_keyring) /* the caller supplied one */ key_get(dest_keyring); } else { + bool do_perm_check = true; + /* use a default keyring; falling through the cases until we * find one that we actually have */ switch (cred->jit_keyring) { @@ -278,8 +281,10 @@ static void construct_get_dest_keyring(struct key **_dest_keyring) dest_keyring = key_get(rka->dest_keyring); up_read(&authkey->sem); - if (dest_keyring) + if (dest_keyring) { + do_perm_check = false; break; + } } case KEY_REQKEY_DEFL_THREAD_KEYRING: @@ -314,11 +319,29 @@ static void construct_get_dest_keyring(struct key **_dest_keyring) default: BUG(); } + + /* + * Require Write permission on the keyring. This is essential + * because the default keyring may be the session keyring, and + * joining a keyring only requires Search permission. + * + * However, this check is skipped for the "requestor keyring" so + * that /sbin/request-key can itself use request_key() to add + * keys to the original requestor's destination keyring. + */ + if (dest_keyring && do_perm_check) { + ret = key_permission(make_key_ref(dest_keyring, 1), + KEY_NEED_WRITE); + if (ret) { + key_put(dest_keyring); + return ret; + } + } } *_dest_keyring = dest_keyring; kleave(" [dk %d]", key_serial(dest_keyring)); - return; + return 0; } /* @@ -444,11 +467,15 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx, if (ctx->index_key.type == &key_type_keyring) return ERR_PTR(-EPERM); - user = key_user_lookup(current_fsuid()); - if (!user) - return ERR_PTR(-ENOMEM); + ret = construct_get_dest_keyring(&dest_keyring); + if (ret) + goto error; - construct_get_dest_keyring(&dest_keyring); + user = key_user_lookup(current_fsuid()); + if (!user) { + ret = -ENOMEM; + goto error_put_dest_keyring; + } ret = construct_alloc_key(ctx, dest_keyring, flags, user, &key); key_user_put(user); @@ -463,7 +490,7 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx, } else if (ret == -EINPROGRESS) { ret = 0; } else { - goto couldnt_alloc_key; + goto error_put_dest_keyring; } key_put(dest_keyring); @@ -473,8 +500,9 @@ static struct key *construct_key_and_link(struct keyring_search_context *ctx, construction_failed: key_negate_and_link(key, key_negative_timeout, NULL, NULL); key_put(key); -couldnt_alloc_key: +error_put_dest_keyring: key_put(dest_keyring); +error: kleave(" = %d", ret); return ERR_PTR(ret); } From 28e7c9a8e548c8c7e839f6ae1e21d61dc92308b0 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 8 Dec 2017 15:13:29 +0000 Subject: [PATCH 0214/1013] KEYS: reject NULL restriction string when type is specified commit 18026d866801d0c52e5550210563222bd6c7191d upstream. keyctl_restrict_keyring() allows through a NULL restriction when the "type" is non-NULL, which causes a NULL pointer dereference in asymmetric_lookup_restriction() when it calls strcmp() on the restriction string. But no key types actually use a "NULL restriction" to mean anything, so update keyctl_restrict_keyring() to reject it with EINVAL. Reported-by: syzbot Fixes: 97d3aa0f3134 ("KEYS: Add a lookup_restriction function for the asymmetric key type") Signed-off-by: Eric Biggers Signed-off-by: David Howells Signed-off-by: Greg Kroah-Hartman --- security/keys/keyctl.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/security/keys/keyctl.c b/security/keys/keyctl.c index 76d22f726ae4..1ffe60bb2845 100644 --- a/security/keys/keyctl.c +++ b/security/keys/keyctl.c @@ -1588,9 +1588,8 @@ long keyctl_session_to_parent(void) * The caller must have Setattr permission to change keyring restrictions. * * The requested type name may be a NULL pointer to reject all attempts - * to link to the keyring. If _type is non-NULL, _restriction can be - * NULL or a pointer to a string describing the restriction. If _type is - * NULL, _restriction must also be NULL. + * to link to the keyring. In this case, _restriction must also be NULL. + * Otherwise, both _type and _restriction must be non-NULL. * * Returns 0 if successful. */ @@ -1598,7 +1597,6 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, const char __user *_restriction) { key_ref_t key_ref; - bool link_reject = !_type; char type[32]; char *restriction = NULL; long ret; @@ -1607,31 +1605,29 @@ long keyctl_restrict_keyring(key_serial_t id, const char __user *_type, if (IS_ERR(key_ref)) return PTR_ERR(key_ref); + ret = -EINVAL; if (_type) { - ret = key_get_type_from_user(type, _type, sizeof(type)); - if (ret < 0) + if (!_restriction) goto error; - } - if (_restriction) { - if (!_type) { - ret = -EINVAL; + ret = key_get_type_from_user(type, _type, sizeof(type)); + if (ret < 0) goto error; - } restriction = strndup_user(_restriction, PAGE_SIZE); if (IS_ERR(restriction)) { ret = PTR_ERR(restriction); goto error; } + } else { + if (_restriction) + goto error; } - ret = keyring_restrict(key_ref, link_reject ? NULL : type, restriction); + ret = keyring_restrict(key_ref, _type ? type : NULL, restriction); kfree(restriction); - error: key_ref_put(key_ref); - return ret; } From 31b3bcc66f52790010f6a5d4458cabf89f2ffcfa Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 8 Dec 2017 15:13:27 +0000 Subject: [PATCH 0215/1013] X.509: reject invalid BIT STRING for subjectPublicKey commit 0f30cbea005bd3077bd98cd29277d7fc2699c1da upstream. Adding a specially crafted X.509 certificate whose subjectPublicKey ASN.1 value is zero-length caused x509_extract_key_data() to set the public key size to SIZE_MAX, as it subtracted the nonexistent BIT STRING metadata byte. Then, x509_cert_parse() called kmemdup() with that bogus size, triggering the WARN_ON_ONCE() in kmalloc_slab(). This appears to be harmless, but it still must be fixed since WARNs are never supposed to be user-triggerable. Fix it by updating x509_cert_parse() to validate that the value has a BIT STRING metadata byte, and that the byte is 0 which indicates that the number of bits in the bitstring is a multiple of 8. It would be nice to handle the metadata byte in asn1_ber_decoder() instead. But that would be tricky because in the general case a BIT STRING could be implicitly tagged, and/or could legitimately have a length that is not a whole number of bytes. Here was the WARN (cleaned up slightly): WARNING: CPU: 1 PID: 202 at mm/slab_common.c:971 kmalloc_slab+0x5d/0x70 mm/slab_common.c:971 Modules linked in: CPU: 1 PID: 202 Comm: keyctl Tainted: G B 4.14.0-09238-g1d3b78bbc6e9 #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 task: ffff880033014180 task.stack: ffff8800305c8000 Call Trace: __do_kmalloc mm/slab.c:3706 [inline] __kmalloc_track_caller+0x22/0x2e0 mm/slab.c:3726 kmemdup+0x17/0x40 mm/util.c:118 kmemdup include/linux/string.h:414 [inline] x509_cert_parse+0x2cb/0x620 crypto/asymmetric_keys/x509_cert_parser.c:106 x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174 asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388 key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0x96 Fixes: 42d5ec27f873 ("X.509: Add an ASN.1 decoder") Signed-off-by: Eric Biggers Signed-off-by: David Howells Reviewed-by: James Morris Signed-off-by: Greg Kroah-Hartman --- crypto/asymmetric_keys/x509_cert_parser.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/crypto/asymmetric_keys/x509_cert_parser.c b/crypto/asymmetric_keys/x509_cert_parser.c index dd03fead1ca3..ce2df8c9c583 100644 --- a/crypto/asymmetric_keys/x509_cert_parser.c +++ b/crypto/asymmetric_keys/x509_cert_parser.c @@ -409,6 +409,8 @@ int x509_extract_key_data(void *context, size_t hdrlen, ctx->cert->pub->pkey_algo = "rsa"; /* Discard the BIT STRING metadata */ + if (vlen < 1 || *(const u8 *)value != 0) + return -EBADMSG; ctx->key = value + 1; ctx->key_size = vlen - 1; return 0; From 70feeaaabf4fabf78812d33facd7f8f993b33011 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 8 Dec 2017 15:13:29 +0000 Subject: [PATCH 0216/1013] X.509: fix comparisons of ->pkey_algo commit 54c1fb39fe0495f846539ab765925b008f86801c upstream. ->pkey_algo used to be an enum, but was changed to a string by commit 4e8ae72a75aa ("X.509: Make algo identifiers text instead of enum"). But two comparisons were not updated. Fix them to use strcmp(). This bug broke signature verification in certain configurations, depending on whether the string constants were deduplicated or not. Fixes: 4e8ae72a75aa ("X.509: Make algo identifiers text instead of enum") Signed-off-by: Eric Biggers Signed-off-by: David Howells Signed-off-by: Greg Kroah-Hartman --- crypto/asymmetric_keys/pkcs7_verify.c | 2 +- crypto/asymmetric_keys/x509_public_key.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/crypto/asymmetric_keys/pkcs7_verify.c b/crypto/asymmetric_keys/pkcs7_verify.c index 2d93d9eccb4d..986033e64a83 100644 --- a/crypto/asymmetric_keys/pkcs7_verify.c +++ b/crypto/asymmetric_keys/pkcs7_verify.c @@ -150,7 +150,7 @@ static int pkcs7_find_key(struct pkcs7_message *pkcs7, pr_devel("Sig %u: Found cert serial match X.509[%u]\n", sinfo->index, certix); - if (x509->pub->pkey_algo != sinfo->sig->pkey_algo) { + if (strcmp(x509->pub->pkey_algo, sinfo->sig->pkey_algo) != 0) { pr_warn("Sig %u: X.509 algo and PKCS#7 sig algo don't match\n", sinfo->index); continue; diff --git a/crypto/asymmetric_keys/x509_public_key.c b/crypto/asymmetric_keys/x509_public_key.c index eea71dc9686c..1bd0cf71a22d 100644 --- a/crypto/asymmetric_keys/x509_public_key.c +++ b/crypto/asymmetric_keys/x509_public_key.c @@ -135,7 +135,7 @@ int x509_check_for_self_signed(struct x509_certificate *cert) } ret = -EKEYREJECTED; - if (cert->pub->pkey_algo != cert->sig->pkey_algo) + if (strcmp(cert->pub->pkey_algo, cert->sig->pkey_algo) != 0) goto out; ret = public_key_verify_signature(cert->pub, cert->sig); From e7bb5cf984ba767bc7badbb4f2bead35b88783d6 Mon Sep 17 00:00:00 2001 From: Chunyu Hu Date: Mon, 27 Nov 2017 22:21:39 +0800 Subject: [PATCH 0217/1013] x86/idt: Load idt early in start_secondary commit 55d2d0ad2fb4325f615d1950486fbc5e6fba1769 upstream. On a secondary, idt is first loaded in cpu_init() with load_current_idt(), i.e. no exceptions can be handled before that point. The conversion of WARN() to use UD requires the IDT being loaded earlier as any warning between start_secondary() and load_curren_idt() in cpu_init() will result in an unhandled @UD exception and therefore fail the bringup of the CPU. Install the IDT handlers right in start_secondary() before calling cpu_init(). [ tglx: Massaged changelog ] Fixes: 9a93848fe787 ("x86/debug: Implement __WARN() using UD0") Signed-off-by: Chunyu Hu Signed-off-by: Thomas Gleixner Cc: peterz@infradead.org Cc: bp@alien8.de Cc: rostedt@goodmis.org Cc: luto@kernel.org Link: https://lkml.kernel.org/r/1511792499-4073-1-git-send-email-chuhu@redhat.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/smpboot.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 65a0ccdc3050..5e0453f18a57 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -239,7 +239,7 @@ static void notrace start_secondary(void *unused) load_cr3(swapper_pg_dir); __flush_tlb_all(); #endif - + load_current_idt(); cpu_init(); x86_cpuinit.early_percpu_clock_init(); preempt_disable(); From d333778b05edfa13a3eb51f3d98efae7b7c2caa8 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 1 Dec 2017 15:08:12 +0100 Subject: [PATCH 0218/1013] x86/PCI: Make broadcom_postcore_init() check acpi_disabled commit ddec3bdee05b06f1dda20ded003c3e10e4184cab upstream. acpi_os_get_root_pointer() may return a valid address even if acpi_disabled is set, but the host bridge information from the ACPI tables is not going to be used in that case and the Broadcom host bridge initialization should not be skipped then, So make broadcom_postcore_init() check acpi_disabled too to avoid this issue. Fixes: 6361d72b04d1 (x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan) Reported-by: Dave Hansen Signed-off-by: Rafael J. Wysocki Signed-off-by: Thomas Gleixner Cc: Bjorn Helgaas Cc: Linux PCI Link: https://lkml.kernel.org/r/3186627.pxZj1QbYNg@aspire.rjw.lan Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/pci/broadcom_bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/pci/broadcom_bus.c b/arch/x86/pci/broadcom_bus.c index bb461cfd01ab..526536c81ddc 100644 --- a/arch/x86/pci/broadcom_bus.c +++ b/arch/x86/pci/broadcom_bus.c @@ -97,7 +97,7 @@ static int __init broadcom_postcore_init(void) * We should get host bridge information from ACPI unless the BIOS * doesn't support it. */ - if (acpi_os_get_root_pointer()) + if (!acpi_disabled && acpi_os_get_root_pointer()) return 0; #endif From 58582f04bc87b9d8d848d9163ce3355dd6f00602 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Radim=20Kr=C4=8Dm=C3=A1=C5=99?= Date: Thu, 30 Nov 2017 19:05:45 +0100 Subject: [PATCH 0219/1013] KVM: x86: fix APIC page invalidation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit b1394e745b9453dcb5b0671c205b770e87dedb87 upstream. Implementation of the unpinned APIC page didn't update the VMCS address cache when invalidation was done through range mmu notifiers. This became a problem when the page notifier was removed. Re-introduce the arch-specific helper and call it from ...range_start. Reported-by: Fabian Grünbichler Fixes: 38b9917350cb ("kvm: vmx: Implement set_apic_access_page_addr") Fixes: 369ea8242c0f ("mm/rmap: update to new mmu_notifier semantic v2") Reviewed-by: Paolo Bonzini Reviewed-by: Andrea Arcangeli Tested-by: Wanpeng Li Tested-by: Fabian Grünbichler Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/x86.c | 14 ++++++++++++++ virt/kvm/kvm_main.c | 8 ++++++++ 3 files changed, 25 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index c73e493adf07..eb38ac9d9a31 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1426,4 +1426,7 @@ static inline int kvm_cpu_get_apicid(int mps_cpu) #endif } +void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, + unsigned long start, unsigned long end); + #endif /* _ASM_X86_KVM_HOST_H */ diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4195cbcdb310..df62cdc7a258 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6745,6 +6745,20 @@ static void kvm_vcpu_flush_tlb(struct kvm_vcpu *vcpu) kvm_x86_ops->tlb_flush(vcpu); } +void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, + unsigned long start, unsigned long end) +{ + unsigned long apic_address; + + /* + * The physical address of apic access page is stored in the VMCS. + * Update it when it becomes invalid. + */ + apic_address = gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); + if (start <= apic_address && apic_address < end) + kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); +} + void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) { struct page *page = NULL; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 9deb5a245b83..484e8820c382 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -136,6 +136,11 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm); static unsigned long long kvm_createvm_count; static unsigned long long kvm_active_vms; +__weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, + unsigned long start, unsigned long end) +{ +} + bool kvm_is_reserved_pfn(kvm_pfn_t pfn) { if (pfn_valid(pfn)) @@ -361,6 +366,9 @@ static void kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, kvm_flush_remote_tlbs(kvm); spin_unlock(&kvm->mmu_lock); + + kvm_arch_mmu_notifier_invalidate_range(kvm, start, end); + srcu_read_unlock(&kvm->srcu, idx); } From 85ab61fdfa4866cd0465c75fb97a0601fed9973d Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Mon, 4 Dec 2017 13:11:45 -0500 Subject: [PATCH 0220/1013] btrfs: fix missing error return in btrfs_drop_snapshot commit e19182c0fff451e3744c1107d98f072e7ca377a0 upstream. If btrfs_del_root fails in btrfs_drop_snapshot, we'll pick up the error but then return 0 anyway due to mixing err and ret. Fixes: 79787eaab4612 ("btrfs: replace many BUG_ONs with proper error handling") Signed-off-by: Jeff Mahoney Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/extent-tree.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index e4774c02d922..d227d8514b25 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -9283,6 +9283,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, ret = btrfs_del_root(trans, fs_info, &root->root_key); if (ret) { btrfs_abort_transaction(trans, ret); + err = ret; goto out_end_trans; } From a121ecb3764e84da656b8d20138997f2fd6aed6f Mon Sep 17 00:00:00 2001 From: Jeff Mahoney Date: Tue, 21 Nov 2017 13:58:49 -0500 Subject: [PATCH 0221/1013] btrfs: handle errors while updating refcounts in update_ref_for_cow commit 692826b2738101549f032a761a9191636e83be4e upstream. Since commit fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting time out of commit trans) the assumption that btrfs_add_delayed_{data,tree}_ref can only return 0 or -ENOMEM has been false. The qgroup operations call into btrfs_search_slot and friends and can now return the full spectrum of error codes. Fortunately, the fix here is easy since update_ref_for_cow failing is already handled so we just need to bail early with the error code. Fixes: fb235dc06fa (btrfs: qgroup: Move half of the qgroup accounting ...) Signed-off-by: Jeff Mahoney Reviewed-by: Edmund Nadolski Reviewed-by: Qu Wenruo Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/ctree.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index 6d49db7d86be..e2bb2a065741 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -1032,14 +1032,17 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && !(flags & BTRFS_BLOCK_FLAG_FULL_BACKREF)) { ret = btrfs_inc_ref(trans, root, buf, 1); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + return ret; if (root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) { ret = btrfs_dec_ref(trans, root, buf, 0); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + return ret; ret = btrfs_inc_ref(trans, root, cow, 1); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + return ret; } new_flags |= BTRFS_BLOCK_FLAG_FULL_BACKREF; } else { @@ -1049,7 +1052,8 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, ret = btrfs_inc_ref(trans, root, cow, 1); else ret = btrfs_inc_ref(trans, root, cow, 0); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + return ret; } if (new_flags != 0) { int level = btrfs_header_level(buf); @@ -1068,9 +1072,11 @@ static noinline int update_ref_for_cow(struct btrfs_trans_handle *trans, ret = btrfs_inc_ref(trans, root, cow, 1); else ret = btrfs_inc_ref(trans, root, cow, 0); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + return ret; ret = btrfs_dec_ref(trans, root, buf, 1); - BUG_ON(ret); /* -ENOMEM */ + if (ret) + return ret; } clean_tree_block(fs_info, buf); *last_ref = 1; From c6705e4c64eb549c1eb86007438034dd7b7a6129 Mon Sep 17 00:00:00 2001 From: Kailang Yang Date: Tue, 5 Dec 2017 15:38:24 +0800 Subject: [PATCH 0222/1013] ALSA: hda/realtek - New codec support for ALC257 commit f429e7e494afaded76e62c6f98211a635aa03098 upstream. Add new support for ALC257 codec. [ It's supposed to be almost equivalent with other ALC25x variants, just adding another type and id -- tiwai ] Signed-off-by: Kailang Yang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 7c39114d124f..b076386c8952 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -330,6 +330,7 @@ static void alc_fill_eapd_coef(struct hda_codec *codec) case 0x10ec0236: case 0x10ec0255: case 0x10ec0256: + case 0x10ec0257: case 0x10ec0282: case 0x10ec0283: case 0x10ec0286: @@ -2749,6 +2750,7 @@ enum { ALC269_TYPE_ALC298, ALC269_TYPE_ALC255, ALC269_TYPE_ALC256, + ALC269_TYPE_ALC257, ALC269_TYPE_ALC215, ALC269_TYPE_ALC225, ALC269_TYPE_ALC294, @@ -2782,6 +2784,7 @@ static int alc269_parse_auto_config(struct hda_codec *codec) case ALC269_TYPE_ALC298: case ALC269_TYPE_ALC255: case ALC269_TYPE_ALC256: + case ALC269_TYPE_ALC257: case ALC269_TYPE_ALC215: case ALC269_TYPE_ALC225: case ALC269_TYPE_ALC294: @@ -6839,6 +6842,10 @@ static int patch_alc269(struct hda_codec *codec) spec->gen.mixer_nid = 0; /* ALC256 does not have any loopback mixer path */ alc_update_coef_idx(codec, 0x36, 1 << 13, 1 << 5); /* Switch pcbeep path to Line in path*/ break; + case 0x10ec0257: + spec->codec_variant = ALC269_TYPE_ALC257; + spec->gen.mixer_nid = 0; + break; case 0x10ec0215: case 0x10ec0285: case 0x10ec0289: @@ -7886,6 +7893,7 @@ static const struct hda_device_id snd_hda_id_realtek[] = { HDA_CODEC_ENTRY(0x10ec0236, "ALC236", patch_alc269), HDA_CODEC_ENTRY(0x10ec0255, "ALC255", patch_alc269), HDA_CODEC_ENTRY(0x10ec0256, "ALC256", patch_alc269), + HDA_CODEC_ENTRY(0x10ec0257, "ALC257", patch_alc269), HDA_CODEC_ENTRY(0x10ec0260, "ALC260", patch_alc260), HDA_CODEC_ENTRY(0x10ec0262, "ALC262", patch_alc262), HDA_CODEC_ENTRY(0x10ec0267, "ALC267", patch_alc268), From 0482dcd51004920b13b59995b7afb66df49937da Mon Sep 17 00:00:00 2001 From: Robb Glasser Date: Tue, 5 Dec 2017 09:16:55 -0800 Subject: [PATCH 0223/1013] ALSA: pcm: prevent UAF in snd_pcm_info commit 362bca57f5d78220f8b5907b875961af9436e229 upstream. When the device descriptor is closed, the `substream->runtime` pointer is freed. But another thread may be in the ioctl handler, case SNDRV_CTL_IOCTL_PCM_INFO. This case calls snd_pcm_info_user() which calls snd_pcm_info() which accesses the now freed `substream->runtime`. Note: this fixes CVE-2017-0861 Signed-off-by: Robb Glasser Signed-off-by: Nick Desaulniers Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/pcm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/core/pcm.c b/sound/core/pcm.c index 7eadb7fd8074..7fea724d093a 100644 --- a/sound/core/pcm.c +++ b/sound/core/pcm.c @@ -153,7 +153,9 @@ static int snd_pcm_control_ioctl(struct snd_card *card, err = -ENXIO; goto _error; } + mutex_lock(&pcm->open_mutex); err = snd_pcm_info_user(substream, info); + mutex_unlock(&pcm->open_mutex); _error: mutex_unlock(®ister_mutex); return err; From 8a4b29a72a7cac97a44dec1b74ec804e0176cca8 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Thu, 30 Nov 2017 10:08:28 +0100 Subject: [PATCH 0224/1013] ALSA: seq: Remove spurious WARN_ON() at timer check commit 43a3542870328601be02fcc9d27b09db467336ef upstream. The use of snd_BUG_ON() in ALSA sequencer timer may lead to a spurious WARN_ON() when a slave timer is deployed as its backend and a corresponding master timer stops meanwhile. The symptom was triggered by syzkaller spontaneously. Since the NULL timer is valid there, rip off snd_BUG_ON(). Reported-by: syzbot Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/seq/seq_timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/core/seq/seq_timer.c b/sound/core/seq/seq_timer.c index 37d9cfbc29f9..b80985fbc334 100644 --- a/sound/core/seq/seq_timer.c +++ b/sound/core/seq/seq_timer.c @@ -355,7 +355,7 @@ static int initialize_timer(struct snd_seq_timer *tmr) unsigned long freq; t = tmr->timeri->timer; - if (snd_BUG_ON(!t)) + if (!t) return -EINVAL; freq = tmr->preferred_resolution; From 3936d752df21f3f83155df3a19ec3617c18741be Mon Sep 17 00:00:00 2001 From: Jaejoong Kim Date: Mon, 4 Dec 2017 15:31:48 +0900 Subject: [PATCH 0225/1013] ALSA: usb-audio: Fix out-of-bound error commit 251552a2b0d454badc8f486e6d79100970c744b0 upstream. The snd_usb_copy_string_desc() retrieves the usb string corresponding to the index number through the usb_string(). The problem is that the usb_string() returns the length of the string (>= 0) when successful, but it can also return a negative value about the error case or status of usb_control_msg(). If iClockSource is '0' as shown below, usb_string() will returns -EINVAL. This will result in '0' being inserted into buf[-22], and the following KASAN out-of-bound error message will be output. AudioControl Interface Descriptor: bLength 8 bDescriptorType 36 bDescriptorSubtype 10 (CLOCK_SOURCE) bClockID 1 bmAttributes 0x07 Internal programmable Clock (synced to SOF) bmControls 0x07 Clock Frequency Control (read/write) Clock Validity Control (read-only) bAssocTerminal 0 iClockSource 0 To fix it, check usb_string()'return value and bail out. ================================================================== BUG: KASAN: stack-out-of-bounds in parse_audio_unit+0x1327/0x1960 [snd_usb_audio] Write of size 1 at addr ffff88007e66735a by task systemd-udevd/18376 CPU: 0 PID: 18376 Comm: systemd-udevd Not tainted 4.13.0+ #3 Hardware name: LG Electronics 15N540-RFLGL/White Tip Mountain, BIOS 15N5 Call Trace: dump_stack+0x63/0x8d print_address_description+0x70/0x290 ? parse_audio_unit+0x1327/0x1960 [snd_usb_audio] kasan_report+0x265/0x350 __asan_store1+0x4a/0x50 parse_audio_unit+0x1327/0x1960 [snd_usb_audio] ? save_stack+0xb5/0xd0 ? save_stack_trace+0x1b/0x20 ? save_stack+0x46/0xd0 ? kasan_kmalloc+0xad/0xe0 ? kmem_cache_alloc_trace+0xff/0x230 ? snd_usb_create_mixer+0xb0/0x4b0 [snd_usb_audio] ? usb_audio_probe+0x4de/0xf40 [snd_usb_audio] ? usb_probe_interface+0x1f5/0x440 ? driver_probe_device+0x3ed/0x660 ? build_feature_ctl+0xb10/0xb10 [snd_usb_audio] ? save_stack_trace+0x1b/0x20 ? init_object+0x69/0xa0 ? snd_usb_find_csint_desc+0xa8/0xf0 [snd_usb_audio] snd_usb_mixer_controls+0x1dc/0x370 [snd_usb_audio] ? build_audio_procunit+0x890/0x890 [snd_usb_audio] ? snd_usb_create_mixer+0xb0/0x4b0 [snd_usb_audio] ? kmem_cache_alloc_trace+0xff/0x230 ? usb_ifnum_to_if+0xbd/0xf0 snd_usb_create_mixer+0x25b/0x4b0 [snd_usb_audio] ? snd_usb_create_stream+0x255/0x2c0 [snd_usb_audio] usb_audio_probe+0x4de/0xf40 [snd_usb_audio] ? snd_usb_autosuspend.part.7+0x30/0x30 [snd_usb_audio] ? __pm_runtime_idle+0x90/0x90 ? kernfs_activate+0xa6/0xc0 ? usb_match_one_id_intf+0xdc/0x130 ? __pm_runtime_set_status+0x2d4/0x450 usb_probe_interface+0x1f5/0x440 Signed-off-by: Jaejoong Kim Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/usb/mixer.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index 2b835cca41b1..0f59da604bbc 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -204,6 +204,10 @@ static int snd_usb_copy_string_desc(struct mixer_build *state, int index, char *buf, int maxlen) { int len = usb_string(state->chip->dev, index, buf, maxlen - 1); + + if (len < 0) + return 0; + buf[len] = 0; return len; } From 3884d12e17ab770aa0f5d4bc65bfbfd006f417fa Mon Sep 17 00:00:00 2001 From: Jaejoong Kim Date: Mon, 4 Dec 2017 15:31:49 +0900 Subject: [PATCH 0226/1013] ALSA: usb-audio: Add check return value for usb_string() commit 89b89d121ffcf8d9546633b98ded9d18b8f75891 upstream. snd_usb_copy_string_desc() returns zero if usb_string() fails. In case of failure, we need to check the snd_usb_copy_string_desc()'s return value and add an exception case Signed-off-by: Jaejoong Kim Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/usb/mixer.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index 0f59da604bbc..4fde4f8d4444 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2178,13 +2178,14 @@ static int parse_audio_selector_unit(struct mixer_build *state, int unitid, if (len) ; else if (nameid) - snd_usb_copy_string_desc(state, nameid, kctl->id.name, + len = snd_usb_copy_string_desc(state, nameid, kctl->id.name, sizeof(kctl->id.name)); - else { + else len = get_term_name(state, &state->oterm, kctl->id.name, sizeof(kctl->id.name), 0); - if (!len) - strlcpy(kctl->id.name, "USB", sizeof(kctl->id.name)); + + if (!len) { + strlcpy(kctl->id.name, "USB", sizeof(kctl->id.name)); if (desc->bDescriptorSubtype == UAC2_CLOCK_SELECTOR) append_ctl_name(kctl, " Clock Source"); From ce1079588ebc42a2e3a0d310a2ea1f3a75aa8d49 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Thu, 28 Sep 2017 15:14:01 +0100 Subject: [PATCH 0227/1013] iommu/vt-d: Fix scatterlist offset handling commit 29a90b70893817e2f2bb3cea40a29f5308e21b21 upstream. The intel-iommu DMA ops fail to correctly handle scatterlists where sg->offset is greater than PAGE_SIZE - the IOVA allocation is computed appropriately based on the page-aligned portion of the offset, but the mapping is set up relative to sg->page, which means it fails to actually cover the whole buffer (and in the worst case doesn't cover it at all): (sg->dma_address + sg->dma_len) ----+ sg->dma_address ---------+ | iov_pfn------+ | | | | | v v v iova: a b c d e f |--------|--------|--------|--------|--------| <...calculated....> [_____mapped______] pfn: 0 1 2 3 4 5 |--------|--------|--------|--------|--------| ^ ^ ^ | | | sg->page ----+ | | sg->offset --------------+ | (sg->offset + sg->length) ----------+ As a result, the caller ends up overrunning the mapping into whatever lies beyond, which usually goes badly: [ 429.645492] DMAR: DRHD: handling fault status reg 2 [ 429.650847] DMAR: [DMA Write] Request device [02:00.4] fault addr f2682000 ... Whilst this is a fairly rare occurrence, it can happen from the result of intermediate scatterlist processing such as scatterwalk_ffwd() in the crypto layer. Whilst that particular site could be fixed up, it still seems worthwhile to bring intel-iommu in line with other DMA API implementations in handling this robustly. To that end, fix the intel_map_sg() path to line up the mapping correctly (in units of MM pages rather than VT-d pages to match the aligned_nrpages() calculation) regardless of the offset, and use sg_phys() consistently for clarity. Reported-by: Harsh Jain Signed-off-by: Robin Murphy Reviewed by: Ashok Raj Tested by: Jacob Pan Signed-off-by: Alex Williamson Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/intel-iommu.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 6784a05dd6b2..83f3d4831f94 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -2254,10 +2254,12 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, uint64_t tmp; if (!sg_res) { + unsigned int pgoff = sg->offset & ~PAGE_MASK; + sg_res = aligned_nrpages(sg->offset, sg->length); - sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + sg->offset; + sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + pgoff; sg->dma_length = sg->length; - pteval = page_to_phys(sg_page(sg)) | prot; + pteval = (sg_phys(sg) - pgoff) | prot; phys_pfn = pteval >> VTD_PAGE_SHIFT; } @@ -3790,7 +3792,7 @@ static int intel_nontranslate_map_sg(struct device *hddev, for_each_sg(sglist, sg, nelems, i) { BUG_ON(!sg_page(sg)); - sg->dma_address = page_to_phys(sg_page(sg)) + sg->offset; + sg->dma_address = sg_phys(sg); sg->dma_length = sg->length; } return nelems; From b68df97ec8d0fd2074c130a64c831b8b760f1228 Mon Sep 17 00:00:00 2001 From: Lai Jiangshan Date: Tue, 28 Nov 2017 21:19:53 +0800 Subject: [PATCH 0228/1013] smp/hotplug: Move step CPUHP_AP_SMPCFD_DYING to the correct place commit 46febd37f9c758b05cd25feae8512f22584742fe upstream. Commit 31487f8328f2 ("smp/cfd: Convert core to hotplug state machine") accidently put this step on the wrong place. The step should be at the cpuhp_ap_states[] rather than the cpuhp_bp_states[]. grep smpcfd /sys/devices/system/cpu/hotplug/states 40: smpcfd:prepare 129: smpcfd:dying "smpcfd:dying" was missing before. So was the invocation of the function smpcfd_dying_cpu(). Fixes: 31487f8328f2 ("smp/cfd: Convert core to hotplug state machine") Signed-off-by: Lai Jiangshan Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Richard Weinberger Cc: Sebastian Andrzej Siewior Cc: Boris Ostrovsky Link: https://lkml.kernel.org/r/20171128131954.81229-1-jiangshanlai@gmail.com Signed-off-by: Greg Kroah-Hartman --- kernel/cpu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index 04892a82f6ac..7891aecc6aec 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1289,11 +1289,6 @@ static struct cpuhp_step cpuhp_bp_states[] = { .teardown.single = NULL, .cant_stop = true, }, - [CPUHP_AP_SMPCFD_DYING] = { - .name = "smpcfd:dying", - .startup.single = NULL, - .teardown.single = smpcfd_dying_cpu, - }, /* * Handled on controll processor until the plugged processor manages * this itself. @@ -1335,6 +1330,11 @@ static struct cpuhp_step cpuhp_ap_states[] = { .startup.single = NULL, .teardown.single = rcutree_dying_cpu, }, + [CPUHP_AP_SMPCFD_DYING] = { + .name = "smpcfd:dying", + .startup.single = NULL, + .teardown.single = smpcfd_dying_cpu, + }, /* Entry state on starting. Interrupts enabled from here on. Transient * state for synchronsization */ [CPUHP_AP_ONLINE] = { From cb90fcfcdd9062453c463622be3029548966f34a Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 20 Nov 2017 12:38:44 +0100 Subject: [PATCH 0229/1013] s390: always save and restore all registers on context switch commit fbbd7f1a51965b50dd12924841da0d478f3da71b upstream. The switch_to() macro has an optimization to avoid saving and restoring register contents that aren't needed for kernel threads. There is however the possibility that a kernel thread execve's a user space program. In such a case the execve'd process can partially see the contents of the previous process, which shouldn't be allowed. To avoid this, simply always save and restore register contents on context switch. Fixes: fdb6d070effba ("switch_to: dont restore/save access & fpu regs for kernel threads") Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/include/asm/switch_to.h | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/arch/s390/include/asm/switch_to.h b/arch/s390/include/asm/switch_to.h index ec7b476c1ac5..c61b2cc1a8a8 100644 --- a/arch/s390/include/asm/switch_to.h +++ b/arch/s390/include/asm/switch_to.h @@ -30,21 +30,20 @@ static inline void restore_access_regs(unsigned int *acrs) asm volatile("lam 0,15,%0" : : "Q" (*(acrstype *)acrs)); } -#define switch_to(prev,next,last) do { \ - if (prev->mm) { \ - save_fpu_regs(); \ - save_access_regs(&prev->thread.acrs[0]); \ - save_ri_cb(prev->thread.ri_cb); \ - save_gs_cb(prev->thread.gs_cb); \ - } \ +#define switch_to(prev, next, last) do { \ + /* save_fpu_regs() sets the CIF_FPU flag, which enforces \ + * a restore of the floating point / vector registers as \ + * soon as the next task returns to user space \ + */ \ + save_fpu_regs(); \ + save_access_regs(&prev->thread.acrs[0]); \ + save_ri_cb(prev->thread.ri_cb); \ + save_gs_cb(prev->thread.gs_cb); \ update_cr_regs(next); \ - if (next->mm) { \ - set_cpu_flag(CIF_FPU); \ - restore_access_regs(&next->thread.acrs[0]); \ - restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \ - restore_gs_cb(next->thread.gs_cb); \ - } \ - prev = __switch_to(prev,next); \ + restore_access_regs(&next->thread.acrs[0]); \ + restore_ri_cb(next->thread.ri_cb, prev->thread.ri_cb); \ + restore_gs_cb(next->thread.gs_cb); \ + prev = __switch_to(prev, next); \ } while (0) #endif /* __ASM_SWITCH_TO_H */ From eb7fb979f090997654d70c24d975e8d3c5711b32 Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Mon, 4 Dec 2017 09:42:45 +0100 Subject: [PATCH 0230/1013] s390/mm: fix off-by-one bug in 5-level page table handling commit 8d306f53b63099fec2d56300149e400d181ba4f5 upstream. Martin Cermak reported that setting a uprobe doesn't work. Reason for this is that the common uprobes code tries to get an unmapped area at the last possible page within an address space. This broke with commit 1aea9b3f9210 ("s390/mm: implement 5 level pages tables") which introduced an off-by-one bug which prevents to map anything at the last possible page within an address space. The check with the off-by-one bug however can be removed since with commit 8ab867cb0806 ("s390/mm: fix BUG_ON in crst_table_upgrade") the necessary check is done at both call sites. Reported-by: Martin Cermak Bisected-by: Thomas Richter Fixes: 1aea9b3f9210 ("s390/mm: implement 5 level pages tables") Reviewed-by: Hendrik Brueckner Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/mm/pgalloc.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/s390/mm/pgalloc.c b/arch/s390/mm/pgalloc.c index cc2faffa7d6e..334b6d103cbd 100644 --- a/arch/s390/mm/pgalloc.c +++ b/arch/s390/mm/pgalloc.c @@ -85,8 +85,6 @@ int crst_table_upgrade(struct mm_struct *mm, unsigned long end) /* upgrade should only happen from 3 to 4, 3 to 5, or 4 to 5 levels */ VM_BUG_ON(mm->context.asce_limit < _REGION2_SIZE); - if (end >= TASK_SIZE_MAX) - return -ENOMEM; rc = 0; notify = 0; while (mm->context.asce_limit < end) { From 0fad60d717bccf42b60b45969bcddba577785d5d Mon Sep 17 00:00:00 2001 From: Heiko Carstens Date: Wed, 6 Dec 2017 16:11:27 +0100 Subject: [PATCH 0231/1013] s390: fix compat system call table commit e779498df587dd2189b30fe5b9245aefab870eb8 upstream. When wiring up the socket system calls the compat entries were incorrectly set. Not all of them point to the corresponding compat wrapper functions, which clear the upper 33 bits of user space pointers, like it is required. Fixes: 977108f89c989 ("s390: wire up separate socketcalls system calls") Signed-off-by: Heiko Carstens Signed-off-by: Martin Schwidefsky Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/syscalls.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/s390/kernel/syscalls.S b/arch/s390/kernel/syscalls.S index d39f121e67a9..bc905ae1d5c8 100644 --- a/arch/s390/kernel/syscalls.S +++ b/arch/s390/kernel/syscalls.S @@ -370,10 +370,10 @@ SYSCALL(sys_recvmmsg,compat_sys_recvmmsg) SYSCALL(sys_sendmmsg,compat_sys_sendmmsg) SYSCALL(sys_socket,sys_socket) SYSCALL(sys_socketpair,compat_sys_socketpair) /* 360 */ -SYSCALL(sys_bind,sys_bind) -SYSCALL(sys_connect,sys_connect) +SYSCALL(sys_bind,compat_sys_bind) +SYSCALL(sys_connect,compat_sys_connect) SYSCALL(sys_listen,sys_listen) -SYSCALL(sys_accept4,sys_accept4) +SYSCALL(sys_accept4,compat_sys_accept4) SYSCALL(sys_getsockopt,compat_sys_getsockopt) /* 365 */ SYSCALL(sys_setsockopt,compat_sys_setsockopt) SYSCALL(sys_getsockname,compat_sys_getsockname) From 59d97bf77a98adb08ce98073842374f17d7876de Mon Sep 17 00:00:00 2001 From: Janosch Frank Date: Mon, 4 Dec 2017 12:19:11 +0100 Subject: [PATCH 0232/1013] KVM: s390: Fix skey emulation permission check commit ca76ec9ca871e67d8cd0b6caba24aca3d3ac4546 upstream. All skey functions call skey_check_enable at their start, which checks if we are in the PSTATE and injects a privileged operation exception if we are. Unfortunately they continue processing afterwards and perform the operation anyhow as skey_check_enable does not deliver an error if the exception injection was successful. Let's move the PSTATE check into the skey functions and exit them on such an occasion, also we now do not enable skey handling anymore in such a case. Signed-off-by: Janosch Frank Reviewed-by: Christian Borntraeger Fixes: a7e19ab ("KVM: s390: handle missing storage-key facility") Reviewed-by: Cornelia Huck Reviewed-by: Thomas Huth Signed-off-by: Christian Borntraeger Signed-off-by: Greg Kroah-Hartman --- arch/s390/kvm/priv.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index c954ac49eee4..5b25287f449b 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -235,8 +235,6 @@ static int try_handle_skey(struct kvm_vcpu *vcpu) VCPU_EVENT(vcpu, 4, "%s", "retrying storage key operation"); return -EAGAIN; } - if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) - return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); return 0; } @@ -247,6 +245,9 @@ static int handle_iske(struct kvm_vcpu *vcpu) int reg1, reg2; int rc; + if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) + return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); + rc = try_handle_skey(vcpu); if (rc) return rc != -EAGAIN ? rc : 0; @@ -276,6 +277,9 @@ static int handle_rrbe(struct kvm_vcpu *vcpu) int reg1, reg2; int rc; + if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) + return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); + rc = try_handle_skey(vcpu); if (rc) return rc != -EAGAIN ? rc : 0; @@ -311,6 +315,9 @@ static int handle_sske(struct kvm_vcpu *vcpu) int reg1, reg2; int rc; + if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) + return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); + rc = try_handle_skey(vcpu); if (rc) return rc != -EAGAIN ? rc : 0; From d46be2f67c1088077e35ab87da6fc2b0d367fb88 Mon Sep 17 00:00:00 2001 From: David Gibson Date: Mon, 4 Dec 2017 16:27:25 +1100 Subject: [PATCH 0233/1013] Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier" commit ab9dbf771ff9b6b7e814e759213ed01d7f0de320 upstream. This reverts commit a3b2cb30f252b21a6f962e0dd107c8b897ca65e4. That commit tried to fix problems with panic on powerpc in certain circumstances, where some output from the generic panic code was being dropped. Unfortunately, it breaks things worse in other circumstances. In particular when running a PAPR guest, it will now attempt to reboot instead of informing the hypervisor (KVM or PowerVM) that the guest has crashed. The crash notification is important to some virtualization management layers. Revert it for now until we can come up with a better solution. Fixes: a3b2cb30f252 ("powerpc: Do not call ppc_md.panic in fadump panic notifier") Signed-off-by: David Gibson [mpe: Tweak change log a bit] Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/machdep.h | 1 + arch/powerpc/include/asm/setup.h | 1 + arch/powerpc/kernel/fadump.c | 22 --------------------- arch/powerpc/kernel/setup-common.c | 27 ++++++++++++++++++++++++++ arch/powerpc/platforms/ps3/setup.c | 15 ++++++++++++++ arch/powerpc/platforms/pseries/setup.c | 1 + 6 files changed, 45 insertions(+), 22 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 73b92017b6d7..cd2fc1cc1cc7 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -76,6 +76,7 @@ struct machdep_calls { void __noreturn (*restart)(char *cmd); void __noreturn (*halt)(void); + void (*panic)(char *str); void (*cpu_die)(void); long (*time_init)(void); /* Optional, may be NULL */ diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 257d23dbf55d..cf00ec26303a 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -24,6 +24,7 @@ extern void reloc_got2(unsigned long); void check_for_initrd(void); void initmem_init(void); +void setup_panic(void); #define ARCH_PANIC_TIMEOUT 180 #ifdef CONFIG_PPC_PSERIES diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index e1431800bfb9..29d2b6050140 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -1453,25 +1453,6 @@ static void fadump_init_files(void) return; } -static int fadump_panic_event(struct notifier_block *this, - unsigned long event, void *ptr) -{ - /* - * If firmware-assisted dump has been registered then trigger - * firmware-assisted dump and let firmware handle everything - * else. If this returns, then fadump was not registered, so - * go through the rest of the panic path. - */ - crash_fadump(NULL, ptr); - - return NOTIFY_DONE; -} - -static struct notifier_block fadump_panic_block = { - .notifier_call = fadump_panic_event, - .priority = INT_MIN /* may not return; must be done last */ -}; - /* * Prepare for firmware-assisted dump. */ @@ -1504,9 +1485,6 @@ int __init setup_fadump(void) init_fadump_mem_struct(&fdm, fw_dump.reserve_dump_area_start); fadump_init_files(); - atomic_notifier_chain_register(&panic_notifier_list, - &fadump_panic_block); - return 1; } subsys_initcall(setup_fadump); diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c index 2e3bc16d02b2..90bc20efb4c7 100644 --- a/arch/powerpc/kernel/setup-common.c +++ b/arch/powerpc/kernel/setup-common.c @@ -704,6 +704,30 @@ int check_legacy_ioport(unsigned long base_port) } EXPORT_SYMBOL(check_legacy_ioport); +static int ppc_panic_event(struct notifier_block *this, + unsigned long event, void *ptr) +{ + /* + * If firmware-assisted dump has been registered then trigger + * firmware-assisted dump and let firmware handle everything else. + */ + crash_fadump(NULL, ptr); + ppc_md.panic(ptr); /* May not return */ + return NOTIFY_DONE; +} + +static struct notifier_block ppc_panic_block = { + .notifier_call = ppc_panic_event, + .priority = INT_MIN /* may not return; must be done last */ +}; + +void __init setup_panic(void) +{ + if (!ppc_md.panic) + return; + atomic_notifier_chain_register(&panic_notifier_list, &ppc_panic_block); +} + #ifdef CONFIG_CHECK_CACHE_COHERENCY /* * For platforms that have configurable cache-coherency. This function @@ -848,6 +872,9 @@ void __init setup_arch(char **cmdline_p) /* Probe the machine type, establish ppc_md. */ probe_machine(); + /* Setup panic notifier if requested by the platform. */ + setup_panic(); + /* * Configure ppc_md.power_save (ppc32 only, 64-bit machines do * it from their respective probe() function. diff --git a/arch/powerpc/platforms/ps3/setup.c b/arch/powerpc/platforms/ps3/setup.c index 9dabea6e1443..6244bc849469 100644 --- a/arch/powerpc/platforms/ps3/setup.c +++ b/arch/powerpc/platforms/ps3/setup.c @@ -104,6 +104,20 @@ static void __noreturn ps3_halt(void) ps3_sys_manager_halt(); /* never returns */ } +static void ps3_panic(char *str) +{ + DBG("%s:%d %s\n", __func__, __LINE__, str); + + smp_send_stop(); + printk("\n"); + printk(" System does not reboot automatically.\n"); + printk(" Please press POWER button.\n"); + printk("\n"); + + while(1) + lv1_pause(1); +} + #if defined(CONFIG_FB_PS3) || defined(CONFIG_FB_PS3_MODULE) || \ defined(CONFIG_PS3_FLASH) || defined(CONFIG_PS3_FLASH_MODULE) static void __init prealloc(struct ps3_prealloc *p) @@ -255,6 +269,7 @@ define_machine(ps3) { .probe = ps3_probe, .setup_arch = ps3_setup_arch, .init_IRQ = ps3_init_IRQ, + .panic = ps3_panic, .get_boot_time = ps3_get_boot_time, .set_dabr = ps3_set_dabr, .calibrate_decr = ps3_calibrate_decr, diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 5f1beb8367ac..a8531e012658 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -726,6 +726,7 @@ define_machine(pseries) { .pcibios_fixup = pSeries_final_fixup, .restart = rtas_restart, .halt = rtas_halt, + .panic = rtas_os_term, .get_boot_time = rtas_get_boot_time, .get_rtc_time = rtas_get_rtc_time, .set_rtc_time = rtas_set_rtc_time, From f11f5c4b87164c88da85193430f36d7d14bdcec4 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Wed, 6 Dec 2017 18:21:14 +1000 Subject: [PATCH 0234/1013] powerpc/64s: Initialize ISAv3 MMU registers before setting partition table commit 371b80447ff33ddac392c189cf884a5a3e18faeb upstream. kexec can leave MMU registers set when booting into a new kernel, the PIDR (Process Identification Register) in particular. The boot sequence does not zero PIDR, so it only gets set when CPUs first switch to a userspace processes (until then it's running a kernel thread with effective PID = 0). This leaves a window where a process table entry and page tables are set up due to user processes running on other CPUs, that happen to match with a stale PID. The CPU with that PID may cause speculative accesses that address quadrant 0 (aka userspace addresses), which will result in cached translations and PWC (Page Walk Cache) for that process, on a CPU which is not in the mm_cpumask and so they will not be invalidated properly. The most common result is the kernel hanging in infinite page fault loops soon after kexec (usually in schedule_tail, which is usually the first non-speculative quadrant 0 access to a new PID) due to a stale PWC. However being a stale translation error, it could result in anything up to security and data corruption problems. Fix this by zeroing out PIDR at boot and kexec. Fixes: 7e381c0ff618 ("powerpc/mm/radix: Add mmu context handling callback for radix") Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/cpu_setup_power.S | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/kernel/cpu_setup_power.S b/arch/powerpc/kernel/cpu_setup_power.S index 610955fe8b81..679bbe714e85 100644 --- a/arch/powerpc/kernel/cpu_setup_power.S +++ b/arch/powerpc/kernel/cpu_setup_power.S @@ -102,6 +102,7 @@ _GLOBAL(__setup_cpu_power9) li r0,0 mtspr SPRN_PSSCR,r0 mtspr SPRN_LPID,r0 + mtspr SPRN_PID,r0 mfspr r3,SPRN_LPCR LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC) or r3, r3, r4 @@ -126,6 +127,7 @@ _GLOBAL(__restore_cpu_power9) li r0,0 mtspr SPRN_PSSCR,r0 mtspr SPRN_LPID,r0 + mtspr SPRN_PID,r0 mfspr r3,SPRN_LPCR LOAD_REG_IMMEDIATE(r4, LPCR_PECEDH | LPCR_PECE_HVEE | LPCR_HVICE | LPCR_HEIC) or r3, r3, r4 From 3b1b28d246c333e9b8ce84e294ae4cad09cfc909 Mon Sep 17 00:00:00 2001 From: Sara Sharon Date: Mon, 8 Feb 2016 23:30:47 +0200 Subject: [PATCH 0235/1013] iwlwifi: mvm: mark MIC stripped MPDUs commit bf19037074e770aad74b3b90f37b8b98db3f3748 upstream. When RADA is active, the hardware decrypts the packets and strips off the MIC as it is useless after decryption. Indicate that to mac80211. [this is needed for the 9000-series HW to work properly] Signed-off-by: Sara Sharon Signed-off-by: Luca Coelho Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c index 248699c2c4bf..402a743b0ad7 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c @@ -253,6 +253,8 @@ static int iwl_mvm_rx_crypto(struct iwl_mvm *mvm, struct ieee80211_hdr *hdr, return -1; stats->flag |= RX_FLAG_DECRYPTED; + if (pkt_flags & FH_RSCSR_RADA_EN) + stats->flag |= RX_FLAG_MIC_STRIPPED; *crypt_len = IEEE80211_CCMP_HDR_LEN; return 0; case IWL_RX_MPDU_STATUS_SEC_TKIP: From 3e6c41d3ed846e6e5120e9c139778a8788212ff1 Mon Sep 17 00:00:00 2001 From: Emmanuel Grumbach Date: Mon, 13 Nov 2017 09:50:47 +0200 Subject: [PATCH 0236/1013] iwlwifi: mvm: don't use transmit queue hang detection when it is not possible commit 0b9832b712d6767d6c7b01965fd788d1ca84fc92 upstream. When we act as an AP, new firmware versions handle internally the power saving clients and the driver doesn't know that the peers went to sleep. It is, hence, possible that a peer goes to sleep for a long time and stop pulling frames. This will cause its transmit queue to hang which is a condition that triggers the recovery flow in the driver. While this client is certainly buggy (it should have pulled the frame based on the TIM IE in the beacon), we can't blow up because of a buggy client. Change the current implementation to not enable the transmit queue hang detection on queues that serve peers when we act as an AP / GO. We can still enable this mechanism using the debug configuration which can come in handy when we want to debug why the client doesn't wake up. Signed-off-by: Emmanuel Grumbach Signed-off-by: Luca Coelho Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/intel/iwlwifi/mvm/utils.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c index 2ea74abad73d..53e269d54050 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/utils.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c @@ -1143,9 +1143,18 @@ unsigned int iwl_mvm_get_wd_timeout(struct iwl_mvm *mvm, unsigned int default_timeout = cmd_q ? IWL_DEF_WD_TIMEOUT : mvm->cfg->base_params->wd_timeout; - if (!iwl_fw_dbg_trigger_enabled(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS)) + if (!iwl_fw_dbg_trigger_enabled(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS)) { + /* + * We can't know when the station is asleep or awake, so we + * must disable the queue hang detection. + */ + if (fw_has_capa(&mvm->fw->ucode_capa, + IWL_UCODE_TLV_CAPA_STA_PM_NOTIF) && + vif && vif->type == NL80211_IFTYPE_AP) + return IWL_WATCHDOG_DISABLED; return iwlmvm_mod_params.tfd_q_hang_detect ? default_timeout : IWL_WATCHDOG_DISABLED; + } trigger = iwl_fw_dbg_get_trigger(mvm->fw, FW_DBG_TRIGGER_TXQ_TIMERS); txq_timer = (void *)trigger->data; From 60e942644974fe4254b8c6ab6cbbb6e69e88a3f8 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 13 Nov 2017 17:26:09 +0100 Subject: [PATCH 0237/1013] iwlwifi: mvm: flush queue before deleting ROC commit 6c2d49fdc5d947c5fe89935bd52e69f10000f4cb upstream. Before deleting a time event (remain-on-channel instance), flush the queue so that frames cannot get stuck on it. We already flush the AUX STA queues, but a separate station is used for the P2P Device queue. Signed-off-by: Johannes Berg Signed-off-by: Luca Coelho Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 2 ++ .../wireless/intel/iwlwifi/mvm/time-event.c | 24 +++++++++++++++++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h index 8dcdb522b846..614d48887355 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h @@ -1042,6 +1042,7 @@ struct iwl_mvm { * @IWL_MVM_STATUS_ROC_AUX_RUNNING: AUX remain-on-channel is running * @IWL_MVM_STATUS_D3_RECONFIG: D3 reconfiguration is being done * @IWL_MVM_STATUS_FIRMWARE_RUNNING: firmware is running + * @IWL_MVM_STATUS_NEED_FLUSH_P2P: need to flush P2P bcast STA */ enum iwl_mvm_status { IWL_MVM_STATUS_HW_RFKILL, @@ -1053,6 +1054,7 @@ enum iwl_mvm_status { IWL_MVM_STATUS_ROC_AUX_RUNNING, IWL_MVM_STATUS_D3_RECONFIG, IWL_MVM_STATUS_FIRMWARE_RUNNING, + IWL_MVM_STATUS_NEED_FLUSH_P2P, }; /* Keep track of completed init configuration */ diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c b/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c index 4d0314912e94..e25cda9fbf6c 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/time-event.c @@ -132,6 +132,24 @@ void iwl_mvm_roc_done_wk(struct work_struct *wk) * executed, and a new time event means a new command. */ iwl_mvm_flush_sta(mvm, &mvm->aux_sta, true, CMD_ASYNC); + + /* Do the same for the P2P device queue (STA) */ + if (test_and_clear_bit(IWL_MVM_STATUS_NEED_FLUSH_P2P, &mvm->status)) { + struct iwl_mvm_vif *mvmvif; + + /* + * NB: access to this pointer would be racy, but the flush bit + * can only be set when we had a P2P-Device VIF, and we have a + * flush of this work in iwl_mvm_prepare_mac_removal() so it's + * not really racy. + */ + + if (!WARN_ON(!mvm->p2p_device_vif)) { + mvmvif = iwl_mvm_vif_from_mac80211(mvm->p2p_device_vif); + iwl_mvm_flush_sta(mvm, &mvmvif->bcast_sta, true, + CMD_ASYNC); + } + } } static void iwl_mvm_roc_finished(struct iwl_mvm *mvm) @@ -855,10 +873,12 @@ void iwl_mvm_stop_roc(struct iwl_mvm *mvm) mvmvif = iwl_mvm_vif_from_mac80211(te_data->vif); - if (te_data->vif->type == NL80211_IFTYPE_P2P_DEVICE) + if (te_data->vif->type == NL80211_IFTYPE_P2P_DEVICE) { iwl_mvm_remove_time_event(mvm, mvmvif, te_data); - else + set_bit(IWL_MVM_STATUS_NEED_FLUSH_P2P, &mvm->status); + } else { iwl_mvm_remove_aux_roc_te(mvm, mvmvif, te_data); + } iwl_mvm_roc_finished(mvm); } From 26d4d23ae67fc3097e73f50fd55a6cb9fbce2a1e Mon Sep 17 00:00:00 2001 From: Ihab Zhaika Date: Thu, 16 Nov 2017 09:29:19 +0200 Subject: [PATCH 0238/1013] iwlwifi: add new cards for 9260 and 22000 series commit 567deca8e72df3ceb6c07c63f8541a4928f64d3b upstream. add 1 PCI ID for 9260 series and 1 for 22000 series. Signed-off-by: Ihab Zhaika Signed-off-by: Luca Coelho Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/intel/iwlwifi/pcie/drv.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c index 548e1928430d..0f7bd37bf172 100644 --- a/drivers/net/wireless/intel/iwlwifi/pcie/drv.c +++ b/drivers/net/wireless/intel/iwlwifi/pcie/drv.c @@ -551,6 +551,7 @@ static const struct pci_device_id iwl_hw_card_ids[] = { {IWL_PCI_DEVICE(0x271B, 0x0014, iwl9160_2ac_cfg)}, {IWL_PCI_DEVICE(0x271B, 0x0210, iwl9160_2ac_cfg)}, {IWL_PCI_DEVICE(0x271B, 0x0214, iwl9260_2ac_cfg)}, + {IWL_PCI_DEVICE(0x271C, 0x0214, iwl9260_2ac_cfg)}, {IWL_PCI_DEVICE(0x2720, 0x0034, iwl9560_2ac_cfg)}, {IWL_PCI_DEVICE(0x2720, 0x0038, iwl9560_2ac_cfg)}, {IWL_PCI_DEVICE(0x2720, 0x003C, iwl9560_2ac_cfg)}, @@ -662,6 +663,7 @@ static const struct pci_device_id iwl_hw_card_ids[] = { {IWL_PCI_DEVICE(0x2720, 0x0310, iwla000_2ac_cfg_hr_cdb)}, {IWL_PCI_DEVICE(0x40C0, 0x0000, iwla000_2ax_cfg_hr)}, {IWL_PCI_DEVICE(0x40C0, 0x0A10, iwla000_2ax_cfg_hr)}, + {IWL_PCI_DEVICE(0xA0F0, 0x0000, iwla000_2ax_cfg_hr)}, #endif /* CONFIG_IWLMVM */ From 0d46809c6b84cde3d7379f8877bad0528c038460 Mon Sep 17 00:00:00 2001 From: Emmanuel Grumbach Date: Sun, 19 Nov 2017 10:35:14 +0200 Subject: [PATCH 0239/1013] iwlwifi: mvm: fix packet injection commit b13f43a48571f0cd0fda271b5046b65f1f268db5 upstream. We need to have a station and a queue for the monitor interface to be able to inject traffic. We used to have this traffic routed to the auxiliary queue, but this queue isn't scheduled for the station we had linked to the monitor vif. Allocate a new queue, link it to the monitor vif's station and make that queue use the BE fifo. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=196715 Signed-off-by: Emmanuel Grumbach Signed-off-by: Luca Coelho Signed-off-by: Greg Kroah-Hartman --- .../net/wireless/intel/iwlwifi/fw/api/txq.h | 4 ++ .../net/wireless/intel/iwlwifi/mvm/mac-ctxt.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/mvm.h | 1 + drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 1 + drivers/net/wireless/intel/iwlwifi/mvm/sta.c | 53 ++++++++++++++----- drivers/net/wireless/intel/iwlwifi/mvm/tx.c | 3 +- 6 files changed, 49 insertions(+), 15 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h b/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h index 87b4434224a1..dfa111bb411e 100644 --- a/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h +++ b/drivers/net/wireless/intel/iwlwifi/fw/api/txq.h @@ -68,6 +68,9 @@ * @IWL_MVM_DQA_CMD_QUEUE: a queue reserved for sending HCMDs to the FW * @IWL_MVM_DQA_AUX_QUEUE: a queue reserved for aux frames * @IWL_MVM_DQA_P2P_DEVICE_QUEUE: a queue reserved for P2P device frames + * @IWL_MVM_DQA_INJECT_MONITOR_QUEUE: a queue reserved for injection using + * monitor mode. Note this queue is the same as the queue for P2P device + * but we can't have active monitor mode along with P2P device anyway. * @IWL_MVM_DQA_GCAST_QUEUE: a queue reserved for P2P GO/SoftAP GCAST frames * @IWL_MVM_DQA_BSS_CLIENT_QUEUE: a queue reserved for BSS activity, to ensure * that we are never left without the possibility to connect to an AP. @@ -87,6 +90,7 @@ enum iwl_mvm_dqa_txq { IWL_MVM_DQA_CMD_QUEUE = 0, IWL_MVM_DQA_AUX_QUEUE = 1, IWL_MVM_DQA_P2P_DEVICE_QUEUE = 2, + IWL_MVM_DQA_INJECT_MONITOR_QUEUE = 2, IWL_MVM_DQA_GCAST_QUEUE = 3, IWL_MVM_DQA_BSS_CLIENT_QUEUE = 4, IWL_MVM_DQA_MIN_MGMT_QUEUE = 5, diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c index a2bf530eeae4..2f22e14e00fe 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c @@ -787,7 +787,7 @@ static int iwl_mvm_mac_ctxt_cmd_listener(struct iwl_mvm *mvm, u32 action) { struct iwl_mac_ctx_cmd cmd = {}; - u32 tfd_queue_msk = 0; + u32 tfd_queue_msk = BIT(mvm->snif_queue); int ret; WARN_ON(vif->type != NL80211_IFTYPE_MONITOR); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h index 614d48887355..2ec27ceb8af9 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mvm.h @@ -954,6 +954,7 @@ struct iwl_mvm { /* Tx queues */ u16 aux_queue; + u16 snif_queue; u16 probe_queue; u16 p2p_dev_queue; diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c index 231878969332..9fb40955d5f4 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/ops.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/ops.c @@ -622,6 +622,7 @@ iwl_op_mode_mvm_start(struct iwl_trans *trans, const struct iwl_cfg *cfg, mvm->fw_restart = iwlwifi_mod_params.fw_restart ? -1 : 0; mvm->aux_queue = IWL_MVM_DQA_AUX_QUEUE; + mvm->snif_queue = IWL_MVM_DQA_INJECT_MONITOR_QUEUE; mvm->probe_queue = IWL_MVM_DQA_AP_PROBE_RESP_QUEUE; mvm->p2p_dev_queue = IWL_MVM_DQA_P2P_DEVICE_QUEUE; diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c index c4a343534c5e..0d7929799942 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/sta.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/sta.c @@ -1700,29 +1700,29 @@ void iwl_mvm_dealloc_int_sta(struct iwl_mvm *mvm, struct iwl_mvm_int_sta *sta) sta->sta_id = IWL_MVM_INVALID_STA; } -static void iwl_mvm_enable_aux_queue(struct iwl_mvm *mvm) +static void iwl_mvm_enable_aux_snif_queue(struct iwl_mvm *mvm, u16 *queue, + u8 sta_id, u8 fifo) { unsigned int wdg_timeout = iwlmvm_mod_params.tfd_q_hang_detect ? mvm->cfg->base_params->wd_timeout : IWL_WATCHDOG_DISABLED; if (iwl_mvm_has_new_tx_api(mvm)) { - int queue = iwl_mvm_tvqm_enable_txq(mvm, mvm->aux_queue, - mvm->aux_sta.sta_id, - IWL_MAX_TID_COUNT, - wdg_timeout); - mvm->aux_queue = queue; + int tvqm_queue = + iwl_mvm_tvqm_enable_txq(mvm, *queue, sta_id, + IWL_MAX_TID_COUNT, + wdg_timeout); + *queue = tvqm_queue; } else { struct iwl_trans_txq_scd_cfg cfg = { - .fifo = IWL_MVM_TX_FIFO_MCAST, - .sta_id = mvm->aux_sta.sta_id, + .fifo = fifo, + .sta_id = sta_id, .tid = IWL_MAX_TID_COUNT, .aggregate = false, .frame_limit = IWL_FRAME_LIMIT, }; - iwl_mvm_enable_txq(mvm, mvm->aux_queue, mvm->aux_queue, 0, &cfg, - wdg_timeout); + iwl_mvm_enable_txq(mvm, *queue, *queue, 0, &cfg, wdg_timeout); } } @@ -1741,7 +1741,9 @@ int iwl_mvm_add_aux_sta(struct iwl_mvm *mvm) /* Map Aux queue to fifo - needs to happen before adding Aux station */ if (!iwl_mvm_has_new_tx_api(mvm)) - iwl_mvm_enable_aux_queue(mvm); + iwl_mvm_enable_aux_snif_queue(mvm, &mvm->aux_queue, + mvm->aux_sta.sta_id, + IWL_MVM_TX_FIFO_MCAST); ret = iwl_mvm_add_int_sta_common(mvm, &mvm->aux_sta, NULL, MAC_INDEX_AUX, 0); @@ -1755,7 +1757,9 @@ int iwl_mvm_add_aux_sta(struct iwl_mvm *mvm) * to firmware so enable queue here - after the station was added */ if (iwl_mvm_has_new_tx_api(mvm)) - iwl_mvm_enable_aux_queue(mvm); + iwl_mvm_enable_aux_snif_queue(mvm, &mvm->aux_queue, + mvm->aux_sta.sta_id, + IWL_MVM_TX_FIFO_MCAST); return 0; } @@ -1763,10 +1767,31 @@ int iwl_mvm_add_aux_sta(struct iwl_mvm *mvm) int iwl_mvm_add_snif_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif) { struct iwl_mvm_vif *mvmvif = iwl_mvm_vif_from_mac80211(vif); + int ret; lockdep_assert_held(&mvm->mutex); - return iwl_mvm_add_int_sta_common(mvm, &mvm->snif_sta, vif->addr, + + /* Map snif queue to fifo - must happen before adding snif station */ + if (!iwl_mvm_has_new_tx_api(mvm)) + iwl_mvm_enable_aux_snif_queue(mvm, &mvm->snif_queue, + mvm->snif_sta.sta_id, + IWL_MVM_TX_FIFO_BE); + + ret = iwl_mvm_add_int_sta_common(mvm, &mvm->snif_sta, vif->addr, mvmvif->id, 0); + if (ret) + return ret; + + /* + * For 22000 firmware and on we cannot add queue to a station unknown + * to firmware so enable queue here - after the station was added + */ + if (iwl_mvm_has_new_tx_api(mvm)) + iwl_mvm_enable_aux_snif_queue(mvm, &mvm->snif_queue, + mvm->snif_sta.sta_id, + IWL_MVM_TX_FIFO_BE); + + return 0; } int iwl_mvm_rm_snif_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif) @@ -1775,6 +1800,8 @@ int iwl_mvm_rm_snif_sta(struct iwl_mvm *mvm, struct ieee80211_vif *vif) lockdep_assert_held(&mvm->mutex); + iwl_mvm_disable_txq(mvm, mvm->snif_queue, mvm->snif_queue, + IWL_MAX_TID_COUNT, 0); ret = iwl_mvm_rm_sta_common(mvm, mvm->snif_sta.sta_id); if (ret) IWL_WARN(mvm, "Failed sending remove station\n"); diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c index 6f2e2af23219..887a504ce64a 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c @@ -657,7 +657,8 @@ int iwl_mvm_tx_skb_non_sta(struct iwl_mvm *mvm, struct sk_buff *skb) if (ap_sta_id != IWL_MVM_INVALID_STA) sta_id = ap_sta_id; } else if (info.control.vif->type == NL80211_IFTYPE_MONITOR) { - queue = mvm->aux_queue; + queue = mvm->snif_queue; + sta_id = mvm->snif_sta.sta_id; } } From 15f36a5ea2278e3ee3fecbdac21895886ee80294 Mon Sep 17 00:00:00 2001 From: David Spinadel Date: Mon, 21 Nov 2016 17:01:25 +0200 Subject: [PATCH 0240/1013] iwlwifi: mvm: enable RX offloading with TKIP and WEP commit 9d0fc5a50a0548f8e5d61243e5e5f26d5c405aef upstream. Set the flag that indicates that ICV was stripped on if this option was enabled in the HW. [this is needed for the 9000-series HW to work properly] Signed-off-by: David Spinadel Signed-off-by: Luca Coelho Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/intel/iwlwifi/iwl-trans.h | 4 +++- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 12 +++++++++--- 2 files changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-trans.h b/drivers/net/wireless/intel/iwlwifi/iwl-trans.h index e90abbfba718..ecd5c1df811c 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-trans.h +++ b/drivers/net/wireless/intel/iwlwifi/iwl-trans.h @@ -117,6 +117,7 @@ #define FH_RSCSR_FRAME_INVALID 0x55550000 #define FH_RSCSR_FRAME_ALIGN 0x40 #define FH_RSCSR_RPA_EN BIT(25) +#define FH_RSCSR_RADA_EN BIT(26) #define FH_RSCSR_RXQ_POS 16 #define FH_RSCSR_RXQ_MASK 0x3F0000 @@ -128,7 +129,8 @@ struct iwl_rx_packet { * 31: flag flush RB request * 30: flag ignore TC (terminal counter) request * 29: flag fast IRQ request - * 28-26: Reserved + * 28-27: Reserved + * 26: RADA enabled * 25: Offload enabled * 24: RPF enabled * 23: RSS enabled diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c index 402a743b0ad7..819e6f66a5b5 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c @@ -232,8 +232,8 @@ static void iwl_mvm_get_signal_strength(struct iwl_mvm *mvm, static int iwl_mvm_rx_crypto(struct iwl_mvm *mvm, struct ieee80211_hdr *hdr, struct ieee80211_rx_status *stats, - struct iwl_rx_mpdu_desc *desc, int queue, - u8 *crypt_len) + struct iwl_rx_mpdu_desc *desc, u32 pkt_flags, + int queue, u8 *crypt_len) { u16 status = le16_to_cpu(desc->status); @@ -272,6 +272,10 @@ static int iwl_mvm_rx_crypto(struct iwl_mvm *mvm, struct ieee80211_hdr *hdr, if ((status & IWL_RX_MPDU_STATUS_SEC_MASK) == IWL_RX_MPDU_STATUS_SEC_WEP) *crypt_len = IEEE80211_WEP_IV_LEN; + + if (pkt_flags & FH_RSCSR_RADA_EN) + stats->flag |= RX_FLAG_ICV_STRIPPED; + return 0; case IWL_RX_MPDU_STATUS_SEC_EXT_ENC: if (!(status & IWL_RX_MPDU_STATUS_MIC_OK)) @@ -812,7 +816,9 @@ void iwl_mvm_rx_mpdu_mq(struct iwl_mvm *mvm, struct napi_struct *napi, rx_status = IEEE80211_SKB_RXCB(skb); - if (iwl_mvm_rx_crypto(mvm, hdr, rx_status, desc, queue, &crypt_len)) { + if (iwl_mvm_rx_crypto(mvm, hdr, rx_status, desc, + le32_to_cpu(pkt->len_n_flags), queue, + &crypt_len)) { kfree_skb(skb); return; } From 01b43f2e3cad60c626daa9b174667a202cee6987 Mon Sep 17 00:00:00 2001 From: Arend Van Spriel Date: Sat, 25 Nov 2017 21:39:25 +0100 Subject: [PATCH 0241/1013] brcmfmac: change driver unbind order of the sdio function devices commit 5c3de777bdaf48bd0cfb43097c0d0fb85056cab7 upstream. In the function brcmf_sdio_firmware_callback() the driver is unbound from the sdio function devices in the error path. However, the order in which it is done resulted in a use-after-free issue (see brcmf_ops_sdio_remove() in bcmsdh.c). Hence change the order and first unbind sdio function #2 device and then unbind sdio function #1 device. Fixes: 7a51461fc2da ("brcmfmac: unbind all devices upon failure in firmware callback") Reported-by: Stefan Wahren Reviewed-by: Hante Meuleman Reviewed-by: Pieter-Paul Giesberts Reviewed-by: Franky Lin Signed-off-by: Arend van Spriel Signed-off-by: Kalle Valo Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c index 613caca7dc02..b3fa8ae80465 100644 --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c @@ -4096,8 +4096,8 @@ static void brcmf_sdio_firmware_callback(struct device *dev, int err, sdio_release_host(sdiodev->func[1]); fail: brcmf_dbg(TRACE, "failed: dev=%s, err=%d\n", dev_name(dev), err); - device_release_driver(dev); device_release_driver(&sdiodev->func[2]->dev); + device_release_driver(dev); } struct brcmf_sdio *brcmf_sdio_probe(struct brcmf_sdio_dev *sdiodev) From 425704be0968368005eabbf5c68182a4d4b23c8b Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Mon, 2 Mar 2015 14:13:36 +0000 Subject: [PATCH 0242/1013] kdb: Fix handling of kallsyms_symbol_next() return value commit c07d35338081d107e57cf37572d8cc931a8e32e2 upstream. kallsyms_symbol_next() returns a boolean (true on success). Currently kdb_read() tests the return value with an inequality that unconditionally evaluates to true. This is fixed in the obvious way and, since the conditional branch is supposed to be unreachable, we also add a WARN_ON(). Reported-by: Dan Carpenter Signed-off-by: Daniel Thompson Signed-off-by: Jason Wessel Signed-off-by: Greg Kroah-Hartman --- kernel/debug/kdb/kdb_io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c index e74be38245ad..ed5d34925ad0 100644 --- a/kernel/debug/kdb/kdb_io.c +++ b/kernel/debug/kdb/kdb_io.c @@ -350,7 +350,7 @@ static char *kdb_read(char *buffer, size_t bufsize) } kdb_printf("\n"); for (i = 0; i < count; i++) { - if (kallsyms_symbol_next(p_tmp, i) < 0) + if (WARN_ON(!kallsyms_symbol_next(p_tmp, i))) break; kdb_printf("%s ", p_tmp); *(p_tmp + len) = '\0'; From 55b26ae24c64ee66188952daacb00aa33bd29407 Mon Sep 17 00:00:00 2001 From: Song Liu Date: Sun, 19 Nov 2017 22:17:00 -0800 Subject: [PATCH 0243/1013] md/r5cache: move mddev_lock() out of r5c_journal_mode_set() commit ff35f58e8f8eb520367879a0ccc6f2ec4b62b17b upstream. r5c_journal_mode_set() is called by r5c_journal_mode_store() and raid_ctr() in dm-raid. We don't need mddev_lock() when calling from raid_ctr(). This patch fixes this by moves the mddev_lock() to r5c_journal_mode_store(). Signed-off-by: Song Liu Signed-off-by: Shaohua Li Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid5-cache.c | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c index 0b7406ac8ce1..9a340728b846 100644 --- a/drivers/md/raid5-cache.c +++ b/drivers/md/raid5-cache.c @@ -2571,31 +2571,22 @@ static ssize_t r5c_journal_mode_show(struct mddev *mddev, char *page) int r5c_journal_mode_set(struct mddev *mddev, int mode) { struct r5conf *conf; - int err; if (mode < R5C_JOURNAL_MODE_WRITE_THROUGH || mode > R5C_JOURNAL_MODE_WRITE_BACK) return -EINVAL; - err = mddev_lock(mddev); - if (err) - return err; conf = mddev->private; - if (!conf || !conf->log) { - mddev_unlock(mddev); + if (!conf || !conf->log) return -ENODEV; - } if (raid5_calc_degraded(conf) > 0 && - mode == R5C_JOURNAL_MODE_WRITE_BACK) { - mddev_unlock(mddev); + mode == R5C_JOURNAL_MODE_WRITE_BACK) return -EINVAL; - } mddev_suspend(mddev); conf->log->r5c_journal_mode = mode; mddev_resume(mddev); - mddev_unlock(mddev); pr_debug("md/raid:%s: setting r5c cache mode to %d: %s\n", mdname(mddev), mode, r5c_journal_mode_str[mode]); @@ -2608,6 +2599,7 @@ static ssize_t r5c_journal_mode_store(struct mddev *mddev, { int mode = ARRAY_SIZE(r5c_journal_mode_str); size_t len = length; + int ret; if (len < 2) return -EINVAL; @@ -2619,8 +2611,12 @@ static ssize_t r5c_journal_mode_store(struct mddev *mddev, if (strlen(r5c_journal_mode_str[mode]) == len && !strncmp(page, r5c_journal_mode_str[mode], len)) break; - - return r5c_journal_mode_set(mddev, mode) ?: length; + ret = mddev_lock(mddev); + if (ret) + return ret; + ret = r5c_journal_mode_set(mddev, mode); + mddev_unlock(mddev); + return ret ?: length; } struct md_sysfs_entry From 25df8b009734b241ce581ac7505efc587cf0402e Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Tue, 21 Nov 2017 08:49:36 +0100 Subject: [PATCH 0244/1013] drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback commit 510353a63796d467b41237ab4f136136f68c297d upstream. get_modes() callback might be called asynchronously from the DRM core and it is not synchronized with bridge_enable(), which sets proper runtime PM state of the main DP device. Fix this by calling pm_runtime_get_sync() before calling drm_get_edid(), which in turn calls drm_dp_i2c_xfer() and analogix_dp_transfer() to ensure that main DP device is runtime active when doing any access to its registers. This fixes the following kernel issue on Samsung Exynos5250 Snow board: Unhandled fault: imprecise external abort (0x406) at 0x00000000 pgd = c0004000 [00000000] *pgd=00000000 Internal error: : 406 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 62 Comm: kworker/0:2 Not tainted 4.13.0-rc2-00364-g4a97a3da420b #3357 Hardware name: SAMSUNG EXYNOS (Flattened Device Tree) Workqueue: events output_poll_execute task: edc14800 task.stack: edcb2000 PC is at analogix_dp_transfer+0x15c/0x2fc LR is at analogix_dp_transfer+0x134/0x2fc pc : [] lr : [] psr: 60000013 sp : edcb3be8 ip : 0000002a fp : 00000001 r10: 00000000 r9 : edcb3cd8 r8 : edcb3c40 r7 : 00000000 r6 : edd3b380 r5 : edd3b010 r4 : 00000064 r3 : 00000000 r2 : f0ad3000 r1 : edcb3c40 r0 : edd3b010 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 10c5387d Table: 4000406a DAC: 00000051 Process kworker/0:2 (pid: 62, stack limit = 0xedcb2210) Stack: (0xedcb3be8 to 0xedcb4000) [] (analogix_dp_transfer) from [] (drm_dp_i2c_do_msg+0x8c/0x2b4) [] (drm_dp_i2c_do_msg) from [] (drm_dp_i2c_xfer+0x98/0x214) [] (drm_dp_i2c_xfer) from [] (__i2c_transfer+0x140/0x29c) [] (__i2c_transfer) from [] (i2c_transfer+0x70/0xe4) [] (i2c_transfer) from [] (drm_do_probe_ddc_edid+0xb4/0x114) [] (drm_do_probe_ddc_edid) from [] (drm_probe_ddc+0x18/0x28) [] (drm_probe_ddc) from [] (drm_get_edid+0x124/0x2d4) [] (drm_get_edid) from [] (analogix_dp_get_modes+0x90/0x114) [] (analogix_dp_get_modes) from [] (drm_helper_probe_single_connector_modes+0x198/0x68c) [] (drm_helper_probe_single_connector_modes) from [] (drm_setup_crtcs+0x1b4/0xd18) [] (drm_setup_crtcs) from [] (drm_fb_helper_hotplug_event+0x94/0xd0) [] (drm_fb_helper_hotplug_event) from [] (drm_kms_helper_hotplug_event+0x24/0x28) [] (drm_kms_helper_hotplug_event) from [] (output_poll_execute+0x6c/0x174) [] (output_poll_execute) from [] (process_one_work+0x188/0x3fc) [] (process_one_work) from [] (worker_thread+0x30/0x4b8) [] (worker_thread) from [] (kthread+0x128/0x164) [] (kthread) from [] (ret_from_fork+0x14/0x24) Code: 0a000002 ea000009 e2544001 0a00004a (e59537c8) ---[ end trace cddc7919c79f7878 ]--- Reported-by: Misha Komarovskiy Signed-off-by: Marek Szyprowski Signed-off-by: Greg Kroah-Hartman Signed-off-by: Archit Taneja Link: https://patchwork.freedesktop.org/patch/msgid/20171121074936.22520-1-m.szyprowski@samsung.com --- drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c index 5dd3f1cd074a..a8905049b9da 100644 --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c @@ -946,7 +946,9 @@ static int analogix_dp_get_modes(struct drm_connector *connector) return 0; } + pm_runtime_get_sync(dp->dev); edid = drm_get_edid(connector, &dp->aux.ddc); + pm_runtime_put(dp->dev); if (edid) { drm_mode_connector_update_edid_property(&dp->connector, edid); From 4b929631c110645b46115d67172d590b210fa8c1 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Wed, 22 Nov 2017 14:14:47 +0100 Subject: [PATCH 0245/1013] drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU commit 120a264f9c2782682027d931d83dcbd22e01da80 upstream. When no IOMMU is available, all GEM buffers allocated by Exynos DRM driver are contiguous, because of the underlying dma_alloc_attrs() function provides only such buffers. In such case it makes no sense to keep BO_NONCONTIG flag for the allocated GEM buffers. This allows to avoid failures for buffer contiguity checks in the subsequent operations on GEM objects. Signed-off-by: Marek Szyprowski Signed-off-by: Inki Dae Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/exynos/exynos_drm_gem.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/exynos/exynos_drm_gem.c b/drivers/gpu/drm/exynos/exynos_drm_gem.c index 077de014d610..4400efe3974a 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gem.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gem.c @@ -247,6 +247,15 @@ struct exynos_drm_gem *exynos_drm_gem_create(struct drm_device *dev, if (IS_ERR(exynos_gem)) return exynos_gem; + if (!is_drm_iommu_supported(dev) && (flags & EXYNOS_BO_NONCONTIG)) { + /* + * when no IOMMU is available, all allocated buffers are + * contiguous anyway, so drop EXYNOS_BO_NONCONTIG flag + */ + flags &= ~EXYNOS_BO_NONCONTIG; + DRM_WARN("Non-contiguous allocation is not supported without IOMMU, falling back to contiguous buffer\n"); + } + /* set memory type and cache attribute from user side. */ exynos_gem->flags = flags; From e547af2582b53354073790df012ba3f852526410 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Wed, 29 Nov 2017 17:37:30 +0200 Subject: [PATCH 0246/1013] drm/i915: Fix vblank timestamp/frame counter jumps on gen2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit a87e55f89f0b0dc541d89248a8445635936a3858 upstream. Previously I was under the impression that the scanline counter reads 0 when the pipe is off. Turns out that's not correct, and instead the scanline counter simply stops when the pipe stops, and it retains it's last value until the pipe starts up again, at which point the scanline counter jumps to vblank start. These jumps can cause the timestamp to jump backwards by one frame. Since we use the timestamps to guesstimage also the frame counter value on gen2, that would cause the frame counter to also jump backwards, which leads to a massice difference from the previous value. The end result is that flips/vblank events don't appear to complete as they're stuck waiting for the frame counter to catch up to that massive difference. Fix the problem properly by actually making sure the scanline counter has started to move before we assume that it's safe to enable vblank processing. v2: Less pointless duplication in the code (Chris) Cc: Daniel Vetter Cc: Chris Wilson Reviewed-by: Chris Wilson Fixes: b7792d8b54cc ("drm/i915: Wait for pipe to start before sampling vblank timestamps on gen2") Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20171129153732.3612-1-ville.syrjala@linux.intel.com (cherry picked from commit 8fedd64dabc86d0f31a0d1e152be3aa23c323553) Signed-off-by: Joonas Lahtinen Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/intel_display.c | 51 +++++++++++++++++++--------- 1 file changed, 35 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 5ebdb63330dd..1c73d5542681 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -1000,7 +1000,8 @@ enum transcoder intel_pipe_to_cpu_transcoder(struct drm_i915_private *dev_priv, return crtc->config->cpu_transcoder; } -static bool pipe_dsl_stopped(struct drm_i915_private *dev_priv, enum pipe pipe) +static bool pipe_scanline_is_moving(struct drm_i915_private *dev_priv, + enum pipe pipe) { i915_reg_t reg = PIPEDSL(pipe); u32 line1, line2; @@ -1015,7 +1016,28 @@ static bool pipe_dsl_stopped(struct drm_i915_private *dev_priv, enum pipe pipe) msleep(5); line2 = I915_READ(reg) & line_mask; - return line1 == line2; + return line1 != line2; +} + +static void wait_for_pipe_scanline_moving(struct intel_crtc *crtc, bool state) +{ + struct drm_i915_private *dev_priv = to_i915(crtc->base.dev); + enum pipe pipe = crtc->pipe; + + /* Wait for the display line to settle/start moving */ + if (wait_for(pipe_scanline_is_moving(dev_priv, pipe) == state, 100)) + DRM_ERROR("pipe %c scanline %s wait timed out\n", + pipe_name(pipe), onoff(state)); +} + +static void intel_wait_for_pipe_scanline_stopped(struct intel_crtc *crtc) +{ + wait_for_pipe_scanline_moving(crtc, false); +} + +static void intel_wait_for_pipe_scanline_moving(struct intel_crtc *crtc) +{ + wait_for_pipe_scanline_moving(crtc, true); } /* @@ -1038,7 +1060,6 @@ static void intel_wait_for_pipe_off(struct intel_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->base.dev); enum transcoder cpu_transcoder = crtc->config->cpu_transcoder; - enum pipe pipe = crtc->pipe; if (INTEL_GEN(dev_priv) >= 4) { i915_reg_t reg = PIPECONF(cpu_transcoder); @@ -1049,9 +1070,7 @@ static void intel_wait_for_pipe_off(struct intel_crtc *crtc) 100)) WARN(1, "pipe_off wait timed out\n"); } else { - /* Wait for the display line to settle */ - if (wait_for(pipe_dsl_stopped(dev_priv, pipe), 100)) - WARN(1, "pipe_off wait timed out\n"); + intel_wait_for_pipe_scanline_stopped(crtc); } } @@ -1944,15 +1963,14 @@ static void intel_enable_pipe(struct intel_crtc *crtc) POSTING_READ(reg); /* - * Until the pipe starts DSL will read as 0, which would cause - * an apparent vblank timestamp jump, which messes up also the - * frame count when it's derived from the timestamps. So let's - * wait for the pipe to start properly before we call - * drm_crtc_vblank_on() + * Until the pipe starts PIPEDSL reads will return a stale value, + * which causes an apparent vblank timestamp jump when PIPEDSL + * resets to its proper value. That also messes up the frame count + * when it's derived from the timestamps. So let's wait for the + * pipe to start properly before we call drm_crtc_vblank_on() */ - if (dev->max_vblank_count == 0 && - wait_for(intel_get_crtc_scanline(crtc) != crtc->scanline_offset, 50)) - DRM_ERROR("pipe %c didn't start\n", pipe_name(pipe)); + if (dev->max_vblank_count == 0) + intel_wait_for_pipe_scanline_moving(crtc); } /** @@ -14682,6 +14700,8 @@ void i830_enable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe) void i830_disable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe) { + struct intel_crtc *crtc = intel_get_crtc_for_pipe(dev_priv, pipe); + DRM_DEBUG_KMS("disabling pipe %c due to force quirk\n", pipe_name(pipe)); @@ -14691,8 +14711,7 @@ void i830_disable_pipe(struct drm_i915_private *dev_priv, enum pipe pipe) I915_WRITE(PIPECONF(pipe), 0); POSTING_READ(PIPECONF(pipe)); - if (wait_for(pipe_dsl_stopped(dev_priv, pipe), 100)) - DRM_ERROR("pipe %c off wait timed out\n", pipe_name(pipe)); + intel_wait_for_pipe_scanline_stopped(crtc); I915_WRITE(DPLL(pipe), DPLL_VGA_MODE_DIS); POSTING_READ(DPLL(pipe)); From 9a11204d2b26324636ff54f8d28095ed5dd17e91 Mon Sep 17 00:00:00 2001 From: Laurent Caumont Date: Sat, 11 Nov 2017 12:44:46 -0500 Subject: [PATCH 0247/1013] media: dvb: i2c transfers over usb cannot be done from stack commit 6d33377f2abbf9f0e561b116dd468d1c3ff36a6a upstream. Signed-off-by: Laurent Caumont Signed-off-by: Sean Young Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/usb/dvb-usb/dibusb-common.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/media/usb/dvb-usb/dibusb-common.c b/drivers/media/usb/dvb-usb/dibusb-common.c index 8207e6900656..bcacb0f22028 100644 --- a/drivers/media/usb/dvb-usb/dibusb-common.c +++ b/drivers/media/usb/dvb-usb/dibusb-common.c @@ -223,8 +223,20 @@ EXPORT_SYMBOL(dibusb_i2c_algo); int dibusb_read_eeprom_byte(struct dvb_usb_device *d, u8 offs, u8 *val) { - u8 wbuf[1] = { offs }; - return dibusb_i2c_msg(d, 0x50, wbuf, 1, val, 1); + u8 *buf; + int rc; + + buf = kmalloc(2, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + buf[0] = offs; + + rc = dibusb_i2c_msg(d, 0x50, &buf[0], 1, &buf[1], 1); + *val = buf[1]; + kfree(buf); + + return rc; } EXPORT_SYMBOL(dibusb_read_eeprom_byte); From 2f2241083a773b52e3319103b0ba2d1331b1f6c0 Mon Sep 17 00:00:00 2001 From: Sean Young Date: Wed, 8 Nov 2017 16:19:45 -0500 Subject: [PATCH 0248/1013] media: rc: sir_ir: detect presence of port commit 30b4e122d71cbec2944a5f8b558b88936ee42f10 upstream. Without this test, sir_ir clumsy claims resources for a device which does not exist. The 0-day kernel test robot reports the following errors (in a loop): sir_ir sir_ir.0: Trapped in interrupt genirq: Flags mismatch irq 4. 00000000 (ttyS0) vs. 00000000 (sir_ir) When sir_ir is loaded with the default io and irq, the following happens: - sir_ir claims irq 4 - user space opens /dev/ttyS0 - in serial8250_do_startup(), some setup is done for ttyS0, which causes irq 4 to fire (in THRE test) - sir_ir does not realise it was not for it, and spins until the "trapped in interrupt" - now serial driver calls setup_irq() and fails and we get the "Flags mismatch" error. There is no port present at 0x3e8 so simply check for the presence of a port, as suggested by Linus. Reported-by: kbuild test robot Tested-by: Fengguang Wu Signed-off-by: Sean Young Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/rc/sir_ir.c | 40 +++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/drivers/media/rc/sir_ir.c b/drivers/media/rc/sir_ir.c index bc906fb128d5..d59918878eb2 100644 --- a/drivers/media/rc/sir_ir.c +++ b/drivers/media/rc/sir_ir.c @@ -57,7 +57,7 @@ static void add_read_queue(int flag, unsigned long val); static irqreturn_t sir_interrupt(int irq, void *dev_id); static void send_space(unsigned long len); static void send_pulse(unsigned long len); -static void init_hardware(void); +static int init_hardware(void); static void drop_hardware(void); /* Initialisation */ @@ -263,11 +263,36 @@ static void send_pulse(unsigned long len) } } -static void init_hardware(void) +static int init_hardware(void) { + u8 scratch, scratch2, scratch3; unsigned long flags; spin_lock_irqsave(&hardware_lock, flags); + + /* + * This is a simple port existence test, borrowed from the autoconfig + * function in drivers/tty/serial/8250/8250_port.c + */ + scratch = sinp(UART_IER); + soutp(UART_IER, 0); +#ifdef __i386__ + outb(0xff, 0x080); +#endif + scratch2 = sinp(UART_IER) & 0x0f; + soutp(UART_IER, 0x0f); +#ifdef __i386__ + outb(0x00, 0x080); +#endif + scratch3 = sinp(UART_IER) & 0x0f; + soutp(UART_IER, scratch); + if (scratch2 != 0 || scratch3 != 0x0f) { + /* we fail, there's nothing here */ + spin_unlock_irqrestore(&hardware_lock, flags); + pr_err("port existence test failed, cannot continue\n"); + return -ENODEV; + } + /* reset UART */ outb(0, io + UART_MCR); outb(0, io + UART_IER); @@ -285,6 +310,8 @@ static void init_hardware(void) /* turn on UART */ outb(UART_MCR_DTR | UART_MCR_RTS | UART_MCR_OUT2, io + UART_MCR); spin_unlock_irqrestore(&hardware_lock, flags); + + return 0; } static void drop_hardware(void) @@ -334,14 +361,19 @@ static int sir_ir_probe(struct platform_device *dev) pr_err("IRQ %d already in use.\n", irq); return retval; } + + retval = init_hardware(); + if (retval) { + del_timer_sync(&timerlist); + return retval; + } + pr_info("I/O port 0x%.4x, IRQ %d.\n", io, irq); retval = devm_rc_register_device(&sir_ir_dev->dev, rcdev); if (retval < 0) return retval; - init_hardware(); - return 0; } From f889ad87b2147dbde1d81715175cf44bdee91fab Mon Sep 17 00:00:00 2001 From: Sean Young Date: Sun, 19 Nov 2017 16:57:27 -0500 Subject: [PATCH 0249/1013] media: rc: partial revert of "media: rc: per-protocol repeat period" commit 67f0f15ad5c47490e19f2526f8f9cea97c5ce1a6 upstream. Since commit d57ea877af38 ("media: rc: per-protocol repeat period"), most IR protocols have a lower keyup timeout. This causes problems on the ite-cir, which has default IR timeout of 200ms. Since the IR decoders read the trailing space, with a IR timeout of 200ms, the last keydown will have at least a delay of 200ms. This is more than the protocol timeout of e.g. rc-6 (which is 164ms). As a result the last IR will be interpreted as a new keydown event, and we get two keypresses. Revert the protocol timeout to 250ms, except for cec which needs a timeout of 550ms. Fixes: d57ea877af38 ("media: rc: per-protocol repeat period") Reported-by: Matthias Reichl Signed-off-by: Sean Young Tested-by: Matthias Reichl Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Greg Kroah-Hartman --- drivers/media/rc/rc-main.c | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/drivers/media/rc/rc-main.c b/drivers/media/rc/rc-main.c index 981cccd6b988..72f381522cb2 100644 --- a/drivers/media/rc/rc-main.c +++ b/drivers/media/rc/rc-main.c @@ -38,41 +38,41 @@ static const struct { [RC_PROTO_UNKNOWN] = { .name = "unknown", .repeat_period = 250 }, [RC_PROTO_OTHER] = { .name = "other", .repeat_period = 250 }, [RC_PROTO_RC5] = { .name = "rc-5", - .scancode_bits = 0x1f7f, .repeat_period = 164 }, + .scancode_bits = 0x1f7f, .repeat_period = 250 }, [RC_PROTO_RC5X_20] = { .name = "rc-5x-20", - .scancode_bits = 0x1f7f3f, .repeat_period = 164 }, + .scancode_bits = 0x1f7f3f, .repeat_period = 250 }, [RC_PROTO_RC5_SZ] = { .name = "rc-5-sz", - .scancode_bits = 0x2fff, .repeat_period = 164 }, + .scancode_bits = 0x2fff, .repeat_period = 250 }, [RC_PROTO_JVC] = { .name = "jvc", .scancode_bits = 0xffff, .repeat_period = 250 }, [RC_PROTO_SONY12] = { .name = "sony-12", - .scancode_bits = 0x1f007f, .repeat_period = 100 }, + .scancode_bits = 0x1f007f, .repeat_period = 250 }, [RC_PROTO_SONY15] = { .name = "sony-15", - .scancode_bits = 0xff007f, .repeat_period = 100 }, + .scancode_bits = 0xff007f, .repeat_period = 250 }, [RC_PROTO_SONY20] = { .name = "sony-20", - .scancode_bits = 0x1fff7f, .repeat_period = 100 }, + .scancode_bits = 0x1fff7f, .repeat_period = 250 }, [RC_PROTO_NEC] = { .name = "nec", - .scancode_bits = 0xffff, .repeat_period = 160 }, + .scancode_bits = 0xffff, .repeat_period = 250 }, [RC_PROTO_NECX] = { .name = "nec-x", - .scancode_bits = 0xffffff, .repeat_period = 160 }, + .scancode_bits = 0xffffff, .repeat_period = 250 }, [RC_PROTO_NEC32] = { .name = "nec-32", - .scancode_bits = 0xffffffff, .repeat_period = 160 }, + .scancode_bits = 0xffffffff, .repeat_period = 250 }, [RC_PROTO_SANYO] = { .name = "sanyo", .scancode_bits = 0x1fffff, .repeat_period = 250 }, [RC_PROTO_MCIR2_KBD] = { .name = "mcir2-kbd", - .scancode_bits = 0xffff, .repeat_period = 150 }, + .scancode_bits = 0xffff, .repeat_period = 250 }, [RC_PROTO_MCIR2_MSE] = { .name = "mcir2-mse", - .scancode_bits = 0x1fffff, .repeat_period = 150 }, + .scancode_bits = 0x1fffff, .repeat_period = 250 }, [RC_PROTO_RC6_0] = { .name = "rc-6-0", - .scancode_bits = 0xffff, .repeat_period = 164 }, + .scancode_bits = 0xffff, .repeat_period = 250 }, [RC_PROTO_RC6_6A_20] = { .name = "rc-6-6a-20", - .scancode_bits = 0xfffff, .repeat_period = 164 }, + .scancode_bits = 0xfffff, .repeat_period = 250 }, [RC_PROTO_RC6_6A_24] = { .name = "rc-6-6a-24", - .scancode_bits = 0xffffff, .repeat_period = 164 }, + .scancode_bits = 0xffffff, .repeat_period = 250 }, [RC_PROTO_RC6_6A_32] = { .name = "rc-6-6a-32", - .scancode_bits = 0xffffffff, .repeat_period = 164 }, + .scancode_bits = 0xffffffff, .repeat_period = 250 }, [RC_PROTO_RC6_MCE] = { .name = "rc-6-mce", - .scancode_bits = 0xffff7fff, .repeat_period = 164 }, + .scancode_bits = 0xffff7fff, .repeat_period = 250 }, [RC_PROTO_SHARP] = { .name = "sharp", .scancode_bits = 0x1fff, .repeat_period = 250 }, [RC_PROTO_XMP] = { .name = "xmp", .repeat_period = 250 }, From c4e71b6f7fd75ff372e766e01aded59d869d7cec Mon Sep 17 00:00:00 2001 From: Kristina Martsenko Date: Thu, 16 Nov 2017 17:58:20 +0000 Subject: [PATCH 0250/1013] arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one commit 26aa7b3b1c0fb3f1a6176a0c1847204ef4355693 upstream. VTTBR_BADDR_MASK is used to sanity check the size and alignment of the VTTBR address. It seems to currently be off by one, thereby only allowing up to 47-bit addresses (instead of 48-bit) and also insufficiently checking the alignment. This patch fixes it. As an example, with 4k pages, before this patch we have: PHYS_MASK_SHIFT = 48 VTTBR_X = 37 - 24 = 13 VTTBR_BADDR_SHIFT = 13 - 1 = 12 VTTBR_BADDR_MASK = ((1 << 35) - 1) << 12 = 0x00007ffffffff000 Which is wrong, because the mask doesn't allow bit 47 of the VTTBR address to be set, and only requires the address to be 12-bit (4k) aligned, while it actually needs to be 13-bit (8k) aligned because we concatenate two 4k tables. With this patch, the mask becomes 0x0000ffffffffe000, which is what we want. Fixes: 0369f6a34b9f ("arm64: KVM: EL2 register definitions") Reviewed-by: Suzuki K Poulose Reviewed-by: Christoffer Dall Signed-off-by: Kristina Martsenko Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/kvm_arm.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h index 61d694c2eae5..555d463c0eaa 100644 --- a/arch/arm64/include/asm/kvm_arm.h +++ b/arch/arm64/include/asm/kvm_arm.h @@ -170,8 +170,7 @@ #define VTCR_EL2_FLAGS (VTCR_EL2_COMMON_BITS | VTCR_EL2_TGRAN_FLAGS) #define VTTBR_X (VTTBR_X_TGRAN_MAGIC - VTCR_EL2_T0SZ_IPA) -#define VTTBR_BADDR_SHIFT (VTTBR_X - 1) -#define VTTBR_BADDR_MASK (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT) +#define VTTBR_BADDR_MASK (((UL(1) << (PHYS_MASK_SHIFT - VTTBR_X)) - 1) << VTTBR_X) #define VTTBR_VMID_SHIFT (UL(48)) #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT) From ba8cbedca6dfca432db2eb9ed4013a2dcb358a71 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 16 Nov 2017 17:58:21 +0000 Subject: [PATCH 0251/1013] arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one commit 5553b142be11e794ebc0805950b2e8313f93d718 upstream. VTTBR_BADDR_MASK is used to sanity check the size and alignment of the VTTBR address. It seems to currently be off by one, thereby only allowing up to 39-bit addresses (instead of 40-bit) and also insufficiently checking the alignment. This patch fixes it. This patch is the 32bit pendent of Kristina's arm64 fix, and she deserves the actual kudos for pinpointing that one. Fixes: f7ed45be3ba52 ("KVM: ARM: World-switch implementation") Reported-by: Kristina Martsenko Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/kvm_arm.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h index c8781450905b..3ab8b3781bfe 100644 --- a/arch/arm/include/asm/kvm_arm.h +++ b/arch/arm/include/asm/kvm_arm.h @@ -161,8 +161,7 @@ #else #define VTTBR_X (5 - KVM_T0SZ) #endif -#define VTTBR_BADDR_SHIFT (VTTBR_X - 1) -#define VTTBR_BADDR_MASK (((_AC(1, ULL) << (40 - VTTBR_X)) - 1) << VTTBR_BADDR_SHIFT) +#define VTTBR_BADDR_MASK (((_AC(1, ULL) << (40 - VTTBR_X)) - 1) << VTTBR_X) #define VTTBR_VMID_SHIFT _AC(48, ULL) #define VTTBR_VMID_MASK(size) (_AT(u64, (1 << size) - 1) << VTTBR_VMID_SHIFT) From a52c2829cd60492fc75bafc323145cab1af915f5 Mon Sep 17 00:00:00 2001 From: Andrew Honig Date: Fri, 1 Dec 2017 10:21:09 -0800 Subject: [PATCH 0252/1013] KVM: VMX: remove I/O port 0x80 bypass on Intel hosts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit d59d51f088014f25c2562de59b9abff4f42a7468 upstream. This fixes CVE-2017-1000407. KVM allows guests to directly access I/O port 0x80 on Intel hosts. If the guest floods this port with writes it generates exceptions and instability in the host kernel, leading to a crash. With this change guest writes to port 0x80 on Intel will behave the same as they currently behave on AMD systems. Prevent the flooding by removing the code that sets port 0x80 as a passthrough port. This is essentially the same as upstream patch 99f85a28a78e96d28907fe036e1671a218fee597, except that patch was for AMD chipsets and this patch is for Intel. Signed-off-by: Andrew Honig Signed-off-by: Jim Mattson Fixes: fdef3ad1b386 ("KVM: VMX: Enable io bitmaps to avoid IO port 0x80 VMEXITs") Signed-off-by: Radim Krčmář Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b21113bcf227..f366e6d3a5e1 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6750,12 +6750,7 @@ static __init int hardware_setup(void) memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE); memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE); - /* - * Allow direct access to the PC debug port (it is often used for I/O - * delays, but the vmexits simply slow things down). - */ memset(vmx_io_bitmap_a, 0xff, PAGE_SIZE); - clear_bit(0x80, vmx_io_bitmap_a); memset(vmx_io_bitmap_b, 0xff, PAGE_SIZE); From 73c4af9627c0914b81832e162deb34f4894760cb Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Sun, 3 Dec 2017 23:54:41 +0100 Subject: [PATCH 0253/1013] KVM: arm/arm64: Fix broken GICH_ELRSR big endian conversion commit fc396e066318c0a02208c1d3f0b62950a7714999 upstream. We are incorrectly rearranging 32-bit words inside a 64-bit typed value for big endian systems, which would result in never marking a virtual interrupt as inactive on big endian systems (assuming 32 or fewer LRs on the hardware). Fix this by not doing any word order manipulation for the typed values. Acked-by: Christoffer Dall Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- virt/kvm/arm/hyp/vgic-v2-sr.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/virt/kvm/arm/hyp/vgic-v2-sr.c b/virt/kvm/arm/hyp/vgic-v2-sr.c index a3f18d362366..d7fd46fe9efb 100644 --- a/virt/kvm/arm/hyp/vgic-v2-sr.c +++ b/virt/kvm/arm/hyp/vgic-v2-sr.c @@ -34,11 +34,7 @@ static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base) else elrsr1 = 0; -#ifdef CONFIG_CPU_BIG_ENDIAN - cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1; -#else cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0; -#endif } static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) From af85c1e04ec5737d595a9b4d0af0ef72a3ceb23b Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 16 Nov 2017 17:58:15 +0000 Subject: [PATCH 0254/1013] KVM: arm/arm64: vgic-irqfd: Fix MSI entry allocation commit 150009e2c70cc3c6e97f00e7595055765d32fb85 upstream. Using the size of the structure we're allocating is a good idea and avoids any surprise... In this case, we're happilly confusing kvm_kernel_irq_routing_entry and kvm_irq_routing_entry... Fixes: 95b110ab9a09 ("KVM: arm/arm64: Enable irqchip routing") Reported-by: AKASHI Takahiro Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- virt/kvm/arm/vgic/vgic-irqfd.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/virt/kvm/arm/vgic/vgic-irqfd.c b/virt/kvm/arm/vgic/vgic-irqfd.c index b7baf581611a..99e026d2dade 100644 --- a/virt/kvm/arm/vgic/vgic-irqfd.c +++ b/virt/kvm/arm/vgic/vgic-irqfd.c @@ -112,8 +112,7 @@ int kvm_vgic_setup_default_irq_routing(struct kvm *kvm) u32 nr = dist->nr_spis; int i, ret; - entries = kcalloc(nr, sizeof(struct kvm_kernel_irq_routing_entry), - GFP_KERNEL); + entries = kcalloc(nr, sizeof(*entries), GFP_KERNEL); if (!entries) return -ENOMEM; From c6c0913bd13f89d1a875cc7d85b3646840bead07 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 16 Nov 2017 17:58:16 +0000 Subject: [PATCH 0255/1013] KVM: arm/arm64: vgic: Preserve the revious read from the pending table commit ddb4b0102cb9cdd2398d98b3e1e024e08a2f4239 upstream. The current pending table parsing code assumes that we keep the previous read of the pending bits, but keep that variable in the current block, making sure it is discarded on each loop. We end-up using whatever is on the stack. Who knows, it might just be the right thing... Fixes: 280771252c1ba ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES") Reported-by: AKASHI Takahiro Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- virt/kvm/arm/vgic/vgic-v3.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/arm/vgic/vgic-v3.c b/virt/kvm/arm/vgic/vgic-v3.c index 96ea597db0e7..502f2100e7bf 100644 --- a/virt/kvm/arm/vgic/vgic-v3.c +++ b/virt/kvm/arm/vgic/vgic-v3.c @@ -324,13 +324,13 @@ int vgic_v3_save_pending_tables(struct kvm *kvm) int last_byte_offset = -1; struct vgic_irq *irq; int ret; + u8 val; list_for_each_entry(irq, &dist->lpi_list_head, lpi_list) { int byte_offset, bit_nr; struct kvm_vcpu *vcpu; gpa_t pendbase, ptr; bool stored; - u8 val; vcpu = irq->target_vcpu; if (!vcpu) From fdbc5f3c5ece0a5db265537185d3bb7ff3b52d0d Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 16 Nov 2017 17:58:18 +0000 Subject: [PATCH 0256/1013] KVM: arm/arm64: vgic-its: Check result of allocation before use commit 686f294f2f1ae40705283dd413ca1e4c14f20f93 upstream. We miss a test against NULL after allocation. Fixes: 6d03a68f8054 ("KVM: arm64: vgic-its: Turn device_id validation into generic ID validation") Reported-by: AKASHI Takahiro Acked-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- virt/kvm/arm/vgic/vgic-its.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 547f12dc4d54..3108e07526af 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -775,6 +775,8 @@ static int vgic_its_alloc_collection(struct vgic_its *its, return E_ITS_MAPC_COLLECTION_OOR; collection = kzalloc(sizeof(*collection), GFP_KERNEL); + if (!collection) + return -ENOMEM; collection->collection_id = coll_id; collection->target_addr = COLLECTION_NOT_MAPPED; From d0e9c7727224011c17e37f3415daec8dd7685ca7 Mon Sep 17 00:00:00 2001 From: Dave Martin Date: Tue, 5 Dec 2017 14:56:42 +0000 Subject: [PATCH 0257/1013] arm64: fpsimd: Prevent registers leaking from dead tasks commit 071b6d4a5d343046f253a5a8835d477d93992002 upstream. Currently, loading of a task's fpsimd state into the CPU registers is skipped if that task's state is already present in the registers of that CPU. However, the code relies on the struct fpsimd_state * (and by extension struct task_struct *) to unambiguously identify a task. There is a particular case in which this doesn't work reliably: when a task exits, its task_struct may be recycled to describe a new task. Consider the following scenario: 1) Task P loads its fpsimd state onto cpu C. per_cpu(fpsimd_last_state, C) := P; P->thread.fpsimd_state.cpu := C; 2) Task X is scheduled onto C and loads its fpsimd state on C. per_cpu(fpsimd_last_state, C) := X; X->thread.fpsimd_state.cpu := C; 3) X exits, causing X's task_struct to be freed. 4) P forks a new child T, which obtains X's recycled task_struct. T == X. T->thread.fpsimd_state.cpu == C (inherited from P). 5) T is scheduled on C. T's fpsimd state is not loaded, because per_cpu(fpsimd_last_state, C) == T (== X) && T->thread.fpsimd_state.cpu == C. (This is the check performed by fpsimd_thread_switch().) So, T gets X's registers because the last registers loaded onto C were those of X, in (2). This patch fixes the problem by ensuring that the sched-in check fails in (5): fpsimd_flush_task_state(T) is called when T is forked, so that T->thread.fpsimd_state.cpu == C cannot be true. This relies on the fact that T is not schedulable until after copy_thread() completes. Once T's fpsimd state has been loaded on some CPU C there may still be other cpus D for which per_cpu(fpsimd_last_state, D) == &X->thread.fpsimd_state. But D is necessarily != C in this case, and the check in (5) must fail. An alternative fix would be to do refcounting on task_struct. This would result in each CPU holding a reference to the last task whose fpsimd state was loaded there. It's not clear whether this is preferable, and it involves higher overhead than the fix proposed in this patch. It would also move all the task_struct freeing work into the context switch critical section, or otherwise some deferred cleanup mechanism would need to be introduced, neither of which seems obviously justified. Fixes: 005f78cd8849 ("arm64: defer reloading a task's FPSIMD state to userland resume") Signed-off-by: Dave Martin Reviewed-by: Ard Biesheuvel [will: word-smithed the comment so it makes more sense] Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kernel/process.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 2dc0f8482210..bcd22d7ee590 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -258,6 +258,15 @@ int copy_thread(unsigned long clone_flags, unsigned long stack_start, memset(&p->thread.cpu_context, 0, sizeof(struct cpu_context)); + /* + * In case p was allocated the same task_struct pointer as some + * other recently-exited task, make sure p is disassociated from + * any cpu that may have run that now-exited task recently. + * Otherwise we could erroneously skip reloading the FPSIMD + * registers for p. + */ + fpsimd_flush_task_state(p); + if (likely(!(p->flags & PF_KTHREAD))) { *childregs = *current_pt_regs(); childregs->regs[0] = 0; From a5347596586db3ab5201ff24be75286dd911f897 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 6 Dec 2017 10:42:10 +0000 Subject: [PATCH 0258/1013] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm commit 0adbdfde8cfc9415aeed2a4955d2d17b3bd9bf13 upstream. update_saved_ttbr0 mandates that mm->pgd is not swapper, since swapper contains kernel mappings and should never be installed into ttbr0. However, this means that callers must avoid passing the init_mm to update_saved_ttbr0 which in turn can cause the saved ttbr0 value to be out-of-date in the context of the idle thread. For example, EFI runtime services may leave the saved ttbr0 pointing at the EFI page table, and kernel threads may end up with stale references to freed page tables. This patch changes update_saved_ttbr0 so that the init_mm points the saved ttbr0 value to the empty zero page, which always exists and never contains valid translations. EFI and switch can then call into update_saved_ttbr0 unconditionally. Cc: Mark Rutland Cc: Ard Biesheuvel Cc: Vinayak Menon Fixes: 39bc88e5e38e9b21 ("arm64: Disable TTBR0_EL1 during normal kernel execution") Reviewed-by: Catalin Marinas Reviewed-by: Mark Rutland Reported-by: Vinayak Menon Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/efi.h | 4 +--- arch/arm64/include/asm/mmu_context.h | 22 +++++++++++++--------- 2 files changed, 14 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h index 650344d01124..c4cd5081d78b 100644 --- a/arch/arm64/include/asm/efi.h +++ b/arch/arm64/include/asm/efi.h @@ -132,11 +132,9 @@ static inline void efi_set_pgd(struct mm_struct *mm) * Defer the switch to the current thread's TTBR0_EL1 * until uaccess_enable(). Restore the current * thread's saved ttbr0 corresponding to its active_mm - * (if different from init_mm). */ cpu_set_reserved_ttbr0(); - if (current->active_mm != &init_mm) - update_saved_ttbr0(current, current->active_mm); + update_saved_ttbr0(current, current->active_mm); } } } diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h index 3257895a9b5e..f7773f90546e 100644 --- a/arch/arm64/include/asm/mmu_context.h +++ b/arch/arm64/include/asm/mmu_context.h @@ -174,11 +174,17 @@ enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) static inline void update_saved_ttbr0(struct task_struct *tsk, struct mm_struct *mm) { - if (system_uses_ttbr0_pan()) { - BUG_ON(mm->pgd == swapper_pg_dir); - task_thread_info(tsk)->ttbr0 = - virt_to_phys(mm->pgd) | ASID(mm) << 48; - } + u64 ttbr; + + if (!system_uses_ttbr0_pan()) + return; + + if (mm == &init_mm) + ttbr = __pa_symbol(empty_zero_page); + else + ttbr = virt_to_phys(mm->pgd) | ASID(mm) << 48; + + task_thread_info(tsk)->ttbr0 = ttbr; } #else static inline void update_saved_ttbr0(struct task_struct *tsk, @@ -214,11 +220,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next, * Update the saved TTBR0_EL1 of the scheduled-in task as the previous * value may have not been initialised yet (activate_mm caller) or the * ASID has changed since the last run (following the context switch - * of another thread of the same process). Avoid setting the reserved - * TTBR0_EL1 to swapper_pg_dir (init_mm; e.g. via idle_task_exit). + * of another thread of the same process). */ - if (next != &init_mm) - update_saved_ttbr0(tsk, next); + update_saved_ttbr0(tsk, next); } #define deactivate_mm(tsk,mm) do { } while (0) From e7ef4e829fe1a7e783a5e8233cde4889f4b65729 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Wed, 6 Dec 2017 10:51:12 +0000 Subject: [PATCH 0259/1013] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb commit d96cc49bff5a7735576cc6f6f111f875d101cec8 upstream. enter_lazy_tlb is called when a kernel thread rides on the back of another mm, due to a context switch or an explicit call to unuse_mm where a call to switch_mm is elided. In these cases, it's important to keep the saved ttbr value up to date with the active mm, otherwise we can end up with a stale value which points to a potentially freed page table. This patch implements enter_lazy_tlb for arm64, so that the saved ttbr0 is kept up-to-date with the active mm for kernel threads. Cc: Mark Rutland Cc: Ard Biesheuvel Cc: Vinayak Menon Fixes: 39bc88e5e38e9b21 ("arm64: Disable TTBR0_EL1 during normal kernel execution") Reviewed-by: Catalin Marinas Reviewed-by: Mark Rutland Reported-by: Vinayak Menon Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/mmu_context.h | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h index f7773f90546e..9d155fa9a507 100644 --- a/arch/arm64/include/asm/mmu_context.h +++ b/arch/arm64/include/asm/mmu_context.h @@ -156,20 +156,6 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu); #define init_new_context(tsk,mm) ({ atomic64_set(&(mm)->context.id, 0); 0; }) -/* - * This is called when "tsk" is about to enter lazy TLB mode. - * - * mm: describes the currently active mm context - * tsk: task which is entering lazy tlb - * cpu: cpu number which is entering lazy tlb - * - * tsk->mm will be NULL - */ -static inline void -enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) -{ -} - #ifdef CONFIG_ARM64_SW_TTBR0_PAN static inline void update_saved_ttbr0(struct task_struct *tsk, struct mm_struct *mm) @@ -193,6 +179,16 @@ static inline void update_saved_ttbr0(struct task_struct *tsk, } #endif +static inline void +enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk) +{ + /* + * We don't actually care about the ttbr0 mapping, so point it at the + * zero page. + */ + update_saved_ttbr0(tsk, &init_mm); +} + static inline void __switch_mm(struct mm_struct *next) { unsigned int cpu = smp_processor_id(); From 1d6c92408159ccfdc60cd988fe229804447f9c7e Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Wed, 15 Nov 2017 10:03:53 -0200 Subject: [PATCH 0260/1013] Revert "ARM: dts: imx53: add srtc node" commit e501506d3ea00eefa64463ebd9e5c13ee70990bd upstream. This reverts commit 5b725054147deaf966b3919e10a86c6bfe946a18. The rtc block on i.MX53 is a completely different hardware than the one found on i.MX25. Reported-by: Noel Vellemans Suggested-by: Juergen Borleis Signed-off-by: Fabio Estevam Signed-off-by: Shawn Guo Signed-off-by: Greg Kroah-Hartman --- arch/arm/boot/dts/imx53.dtsi | 9 --------- 1 file changed, 9 deletions(-) diff --git a/arch/arm/boot/dts/imx53.dtsi b/arch/arm/boot/dts/imx53.dtsi index 8bf0d89cdd35..2e516f4985e4 100644 --- a/arch/arm/boot/dts/imx53.dtsi +++ b/arch/arm/boot/dts/imx53.dtsi @@ -433,15 +433,6 @@ clock-names = "ipg", "per"; }; - srtc: srtc@53fa4000 { - compatible = "fsl,imx53-rtc", "fsl,imx25-rtc"; - reg = <0x53fa4000 0x4000>; - interrupts = <24>; - interrupt-parent = <&tzic>; - clocks = <&clks IMX5_CLK_SRTC_GATE>; - clock-names = "ipg"; - }; - iomuxc: iomuxc@53fa8000 { compatible = "fsl,imx53-iomuxc"; reg = <0x53fa8000 0x4000>; From 2ced9e2a850c40d3bfd7af4791f985318facff68 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 3 Oct 2017 18:14:13 +0100 Subject: [PATCH 0261/1013] bus: arm-cci: Fix use of smp_processor_id() in preemptible context commit 4608af8aa53e7f3922ddee695d023b7bcd5cb35b upstream. The ARM CCI driver seem to be using smp_processor_id() in a preemptible context, which is likely to make a DEBUG_PREMPT kernel scream at boot time. Turn this into a get_cpu()/put_cpu() that extends over the CPU hotplug registration, making sure that we don't race against a CPU down operation. Signed-off-by: Marc Zyngier Acked-by: Mark Rutland Signed-off-by: Pawel Moll Signed-off-by: Greg Kroah-Hartman --- drivers/bus/arm-cci.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/bus/arm-cci.c b/drivers/bus/arm-cci.c index 3c29d36702a8..5426c04fe24b 100644 --- a/drivers/bus/arm-cci.c +++ b/drivers/bus/arm-cci.c @@ -1755,14 +1755,17 @@ static int cci_pmu_probe(struct platform_device *pdev) raw_spin_lock_init(&cci_pmu->hw_events.pmu_lock); mutex_init(&cci_pmu->reserve_mutex); atomic_set(&cci_pmu->active_events, 0); - cpumask_set_cpu(smp_processor_id(), &cci_pmu->cpus); + cpumask_set_cpu(get_cpu(), &cci_pmu->cpus); ret = cci_pmu_init(cci_pmu, pdev); - if (ret) + if (ret) { + put_cpu(); return ret; + } cpuhp_state_add_instance_nocalls(CPUHP_AP_PERF_ARM_CCI_ONLINE, &cci_pmu->node); + put_cpu(); pr_info("ARM %s PMU driver probed", cci_pmu->model->name); return 0; } From a724b569a50d2e94fd2d3ea123e1b5ad32293c61 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 27 Aug 2017 11:06:50 +0100 Subject: [PATCH 0262/1013] bus: arm-ccn: Check memory allocation failure commit 24771179c5c138f0ea3ef88b7972979f62f2d5db upstream. Check memory allocation failures and return -ENOMEM in such cases This avoids a potential NULL pointer dereference. Signed-off-by: Christophe JAILLET Acked-by: Scott Branden Signed-off-by: Pawel Moll Signed-off-by: Greg Kroah-Hartman --- drivers/bus/arm-ccn.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c index e8c6946fed9d..be4bdead4075 100644 --- a/drivers/bus/arm-ccn.c +++ b/drivers/bus/arm-ccn.c @@ -1271,6 +1271,10 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn) int len = snprintf(NULL, 0, "ccn_%d", ccn->dt.id); name = devm_kzalloc(ccn->dev, len + 1, GFP_KERNEL); + if (!name) { + err = -ENOMEM; + goto error_choose_name; + } snprintf(name, len + 1, "ccn_%d", ccn->dt.id); } @@ -1318,6 +1322,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn) error_pmu_register: error_set_affinity: +error_choose_name: ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id); for (i = 0; i < ccn->num_xps; i++) writel(0, ccn->xp[i].base + CCN_XP_DT_CONTROL); From 8741b5ab49403be43771adc57e1c0031514910b9 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 3 Oct 2017 18:14:12 +0100 Subject: [PATCH 0263/1013] bus: arm-ccn: Fix use of smp_processor_id() in preemptible context commit b18c2b9487d8e797fc0a757e57ac3645348c5fba upstream. Booting a DEBUG_PREEMPT enabled kernel on a CCN-based system results in the following splat: [...] arm-ccn e8000000.ccn: No access to interrupts, using timer. BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id+0x1c/0x28 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.13.0 #6111 Hardware name: AMD Seattle/Seattle, BIOS 17:08:23 Jun 26 2017 Call trace: [] dump_backtrace+0x0/0x278 [] show_stack+0x24/0x30 [] dump_stack+0x8c/0xb0 [] check_preemption_disabled+0xfc/0x100 [] debug_smp_processor_id+0x1c/0x28 [] arm_ccn_probe+0x358/0x4f0 [...] as we use smp_processor_id() in the wrong context. Turn this into a get_cpu()/put_cpu() that extends over the CPU hotplug registration, making sure that we don't race against a CPU down operation. Signed-off-by: Marc Zyngier Acked-by: Mark Rutland Signed-off-by: Pawel Moll Signed-off-by: Greg Kroah-Hartman --- drivers/bus/arm-ccn.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c index be4bdead4075..66cb12655e84 100644 --- a/drivers/bus/arm-ccn.c +++ b/drivers/bus/arm-ccn.c @@ -1301,7 +1301,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn) } /* Pick one CPU which we will use to collect data from CCN... */ - cpumask_set_cpu(smp_processor_id(), &ccn->dt.cpu); + cpumask_set_cpu(get_cpu(), &ccn->dt.cpu); /* Also make sure that the overflow interrupt is handled by this CPU */ if (ccn->irq) { @@ -1318,10 +1318,12 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn) cpuhp_state_add_instance_nocalls(CPUHP_AP_PERF_ARM_CCN_ONLINE, &ccn->dt.node); + put_cpu(); return 0; error_pmu_register: error_set_affinity: + put_cpu(); error_choose_name: ida_simple_remove(&arm_ccn_pmu_ida, ccn->dt.id); for (i = 0; i < ccn->num_xps; i++) From 1cef55be2b6bba7f4c262b35db385425ae1bf04e Mon Sep 17 00:00:00 2001 From: Kim Phillips Date: Wed, 11 Oct 2017 22:33:24 +0100 Subject: [PATCH 0264/1013] bus: arm-ccn: fix module unloading Error: Removing state 147 which has instances left. commit b69f63ebf553504739cc8534cbed31bd530c6f0b upstream. Unregistering the driver before calling cpuhp_remove_multi_state() removes any remaining hotplug cpu instances so __cpuhp_remove_state_cpuslocked() doesn't emit this warning: [ 268.748362] Error: Removing state 147 which has instances left. [ 268.748373] ------------[ cut here ]------------ [ 268.748386] WARNING: CPU: 2 PID: 5476 at kernel/cpu.c:1734 __cpuhp_remove_state_cpuslocked+0x454/0x4f0 [ 268.748389] Modules linked in: arm_ccn(-) [last unloaded: arm_ccn] [ 268.748403] CPU: 2 PID: 5476 Comm: rmmod Tainted: G W 4.14.0-rc4+ #3 [ 268.748406] Hardware name: AMD Seattle/Seattle, BIOS 10:18:39 Dec 8 2016 [ 268.748410] task: ffff8001a18ca000 task.stack: ffff80019c120000 [ 268.748416] PC is at __cpuhp_remove_state_cpuslocked+0x454/0x4f0 [ 268.748421] LR is at __cpuhp_remove_state_cpuslocked+0x448/0x4f0 [ 268.748425] pc : [] lr : [] pstate: 60000145 [ 268.748427] sp : ffff80019c127d30 [ 268.748430] x29: ffff80019c127d30 x28: ffff8001a18ca000 [ 268.748437] x27: ffff20000c2cb000 x26: 1fffe4000042d490 [ 268.748443] x25: ffff20000216a480 x24: 0000000000000000 [ 268.748449] x23: ffff20000b08e000 x22: 0000000000000001 [ 268.748455] x21: 0000000000000093 x20: 00000000000016f8 [ 268.748460] x19: ffff20000c2cbb80 x18: 0000ffffb5fe7c58 [ 268.748466] x17: 00000000004402d0 x16: 1fffe40001864f01 [ 268.748472] x15: ffff20000c4bf8b0 x14: 0000000000000000 [ 268.748477] x13: 0000000000007032 x12: ffff20000829ae48 [ 268.748483] x11: ffff20000c4bf000 x10: 0000000000000004 [ 268.748488] x9 : 0000000000006fbc x8 : ffff20000c318a40 [ 268.748494] x7 : 0000000000000000 x6 : ffff040001864f02 [ 268.748500] x5 : 0000000000000000 x4 : 0000000000000000 [ 268.748505] x3 : 0000000000000007 x2 : dfff200000000000 [ 268.748510] x1 : 000000000000ad3d x0 : 00000000000001f0 [ 268.748516] Call trace: [ 268.748521] Exception stack(0xffff80019c127bf0 to 0xffff80019c127d30) [ 268.748526] 7be0: 00000000000001f0 000000000000ad3d [ 268.748531] 7c00: dfff200000000000 0000000000000007 0000000000000000 0000000000000000 [ 268.748535] 7c20: ffff040001864f02 0000000000000000 ffff20000c318a40 0000000000006fbc [ 268.748539] 7c40: 0000000000000004 ffff20000c4bf000 ffff20000829ae48 0000000000007032 [ 268.748544] 7c60: 0000000000000000 ffff20000c4bf8b0 1fffe40001864f01 00000000004402d0 [ 268.748548] 7c80: 0000ffffb5fe7c58 ffff20000c2cbb80 00000000000016f8 0000000000000093 [ 268.748553] 7ca0: 0000000000000001 ffff20000b08e000 0000000000000000 ffff20000216a480 [ 268.748557] 7cc0: 1fffe4000042d490 ffff20000c2cb000 ffff8001a18ca000 ffff80019c127d30 [ 268.748562] 7ce0: ffff2000081729e0 ffff80019c127d30 ffff2000081729ec 0000000060000145 [ 268.748566] 7d00: 00000000000001f0 0000000000000000 0001000000000000 0000000000000000 [ 268.748569] 7d20: ffff80019c127d30 ffff2000081729ec [ 268.748575] [] __cpuhp_remove_state_cpuslocked+0x454/0x4f0 [ 268.748580] [] __cpuhp_remove_state+0x54/0x80 [ 268.748597] [] arm_ccn_exit+0x2c/0x70 [arm_ccn] [ 268.748604] [] SyS_delete_module+0x5a4/0x708 [ 268.748607] Exception stack(0xffff80019c127ec0 to 0xffff80019c128000) [ 268.748612] 7ec0: 0000000019bb7258 0000000000000800 ba64d0fb3d26a800 00000000000000da [ 268.748616] 7ee0: 0000ffffb6144e28 0000ffffcd95b409 fefefefefefefeff 7f7f7f7f7f7f7f7f [ 268.748621] 7f00: 000000000000006a 1999999999999999 0000ffffb6179000 0000000000bbcc6d [ 268.748625] 7f20: 0000ffffb6176b98 0000ffffcd95c2d0 0000ffffb5fe7b58 0000ffffb6163000 [ 268.748630] 7f40: 0000ffffb60ad3e0 00000000004402d0 0000ffffb5fe7c58 0000000019bb71f0 [ 268.748634] 7f60: 0000ffffcd95c740 0000000000000000 0000000019bb71f0 0000000000416700 [ 268.748639] 7f80: 0000000000000000 00000000004402e8 0000000019bb6010 0000ffffcd95c748 [ 268.748643] 7fa0: 0000000000000000 0000ffffcd95c460 00000000004113a8 0000ffffcd95c460 [ 268.748648] 7fc0: 0000ffffb60ad3e8 0000000080000000 0000000019bb7258 000000000000006a [ 268.748652] 7fe0: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 268.748657] [] __sys_trace_return+0x0/0x4 [ 268.748661] ---[ end trace a996d358dcaa7f9c ]--- Fixes: 8df038725ad5 ("bus/arm-ccn: Use cpu-hp's multi instance support instead custom list") Signed-off-by: Kim Phillips Acked-by: Sebastian Andrzej Siewior Signed-off-by: Pawel Moll Signed-off-by: Greg Kroah-Hartman --- drivers/bus/arm-ccn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c index 66cb12655e84..03d7faf51c2b 100644 --- a/drivers/bus/arm-ccn.c +++ b/drivers/bus/arm-ccn.c @@ -1587,8 +1587,8 @@ static int __init arm_ccn_init(void) static void __exit arm_ccn_exit(void) { - cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_CCN_ONLINE); platform_driver_unregister(&arm_ccn_driver); + cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_CCN_ONLINE); } module_init(arm_ccn_init); From 45f846ca6b2119237adfa19a9bbfce43dadb558e Mon Sep 17 00:00:00 2001 From: Parav Pandit Date: Thu, 2 Nov 2017 15:22:27 +0200 Subject: [PATCH 0265/1013] IB/core: Avoid unnecessary return value check commit 2e4c85c6edc80fa532b2c7e1eb3597ef4d4bbb8f upstream. Since there is nothing done with non zero return value, such check is avoided. Signed-off-by: Parav Pandit Reviewed-by: Daniel Jurgens Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/security.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c index 28607bb42d87..23278ed5be45 100644 --- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -697,20 +697,13 @@ void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent) int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index) { - int ret; - if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed) return -EACCES; - ret = ib_security_pkey_access(map->agent.device, - map->agent.port_num, - pkey_index, - map->agent.security); - - if (ret) - return ret; - - return 0; + return ib_security_pkey_access(map->agent.device, + map->agent.port_num, + pkey_index, + map->agent.security); } #endif /* CONFIG_SECURITY_INFINIBAND */ From 796c9d1e275e3e35ae13b16021498bc12ab2593e Mon Sep 17 00:00:00 2001 From: Daniel Jurgens Date: Wed, 29 Nov 2017 20:10:39 +0200 Subject: [PATCH 0266/1013] IB/core: Only enforce security for InfiniBand commit 315d160c5a4e034a576a13aa21e7235d5c9ec609 upstream. For now the only LSM security enforcement mechanism available is specific to InfiniBand. Bypass enforcement for non-IB link types. This fixes a regression where modify_qp fails for iWARP because querying the PKEY returns -EINVAL. Cc: Paul Moore Cc: Don Dutile Reported-by: Potnuri Bharat Teja Fixes: d291f1a65232("IB/core: Enforce PKey security on QPs") Fixes: 47a2b338fe63("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens Reviewed-by: Parav Pandit Tested-by: Potnuri Bharat Teja Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/security.c | 50 +++++++++++++++++++++++++++--- 1 file changed, 46 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c index 23278ed5be45..a337386652b0 100644 --- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -417,8 +417,17 @@ void ib_close_shared_qp_security(struct ib_qp_security *sec) int ib_create_qp_security(struct ib_qp *qp, struct ib_device *dev) { + u8 i = rdma_start_port(dev); + bool is_ib = false; int ret; + while (i <= rdma_end_port(dev) && !is_ib) + is_ib = rdma_protocol_ib(dev, i++); + + /* If this isn't an IB device don't create the security context */ + if (!is_ib) + return 0; + qp->qp_sec = kzalloc(sizeof(*qp->qp_sec), GFP_KERNEL); if (!qp->qp_sec) return -ENOMEM; @@ -441,6 +450,10 @@ EXPORT_SYMBOL(ib_create_qp_security); void ib_destroy_qp_security_begin(struct ib_qp_security *sec) { + /* Return if not IB */ + if (!sec) + return; + mutex_lock(&sec->mutex); /* Remove the QP from the lists so it won't get added to @@ -470,6 +483,10 @@ void ib_destroy_qp_security_abort(struct ib_qp_security *sec) int ret; int i; + /* Return if not IB */ + if (!sec) + return; + /* If a concurrent cache update is in progress this * QP security could be marked for an error state * transition. Wait for this to complete. @@ -505,6 +522,10 @@ void ib_destroy_qp_security_end(struct ib_qp_security *sec) { int i; + /* Return if not IB */ + if (!sec) + return; + /* If a concurrent cache update is occurring we must * wait until this QP security structure is processed * in the QP to error flow before destroying it because @@ -557,7 +578,7 @@ int ib_security_modify_qp(struct ib_qp *qp, { int ret = 0; struct ib_ports_pkeys *tmp_pps; - struct ib_ports_pkeys *new_pps; + struct ib_ports_pkeys *new_pps = NULL; struct ib_qp *real_qp = qp->real_qp; bool special_qp = (real_qp->qp_type == IB_QPT_SMI || real_qp->qp_type == IB_QPT_GSI || @@ -565,18 +586,27 @@ int ib_security_modify_qp(struct ib_qp *qp, bool pps_change = ((qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) || (qp_attr_mask & IB_QP_ALT_PATH)); + WARN_ONCE((qp_attr_mask & IB_QP_PORT && + rdma_protocol_ib(real_qp->device, qp_attr->port_num) && + !real_qp->qp_sec), + "%s: QP security is not initialized for IB QP: %d\n", + __func__, real_qp->qp_num); + /* The port/pkey settings are maintained only for the real QP. Open * handles on the real QP will be in the shared_qp_list. When * enforcing security on the real QP all the shared QPs will be * checked as well. */ - if (pps_change && !special_qp) { + if (pps_change && !special_qp && real_qp->qp_sec) { mutex_lock(&real_qp->qp_sec->mutex); new_pps = get_new_pps(real_qp, qp_attr, qp_attr_mask); - + if (!new_pps) { + mutex_unlock(&real_qp->qp_sec->mutex); + return -ENOMEM; + } /* Add this QP to the lists for the new port * and pkey settings before checking for permission * in case there is a concurrent cache update @@ -600,7 +630,7 @@ int ib_security_modify_qp(struct ib_qp *qp, qp_attr_mask, udata); - if (pps_change && !special_qp) { + if (new_pps) { /* Clean up the lists and free the appropriate * ports_pkeys structure. */ @@ -631,6 +661,9 @@ int ib_security_pkey_access(struct ib_device *dev, u16 pkey; int ret; + if (!rdma_protocol_ib(dev, port_num)) + return 0; + ret = ib_get_cached_pkey(dev, port_num, pkey_index, &pkey); if (ret) return ret; @@ -665,6 +698,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent, { int ret; + if (!rdma_protocol_ib(agent->device, agent->port_num)) + return 0; + ret = security_ib_alloc_security(&agent->security); if (ret) return ret; @@ -690,6 +726,9 @@ int ib_mad_agent_security_setup(struct ib_mad_agent *agent, void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent) { + if (!rdma_protocol_ib(agent->device, agent->port_num)) + return; + security_ib_free_security(agent->security); if (agent->lsm_nb_reg) unregister_lsm_notifier(&agent->lsm_nb); @@ -697,6 +736,9 @@ void ib_mad_agent_security_cleanup(struct ib_mad_agent *agent) int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index) { + if (!rdma_protocol_ib(map->agent.device, map->agent.port_num)) + return 0; + if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed) return -EACCES; From 7b9cf144dc0b164761212586dd86ace41fd601f4 Mon Sep 17 00:00:00 2001 From: LEROY Christophe Date: Fri, 6 Oct 2017 15:04:33 +0200 Subject: [PATCH 0267/1013] crypto: talitos - fix AEAD test failures commit ec8c7d14acc0a477429d3a6fade5dab72c996c82 upstream. AEAD tests fail when destination SG list has more than 1 element. [ 2.058752] alg: aead: Test 1 failed on encryption for authenc-hmac-sha1-cbc-aes-talitos [ 2.066965] 00000000: 53 69 6e 67 6c 65 20 62 6c 6f 63 6b 20 6d 73 67 00000010: c0 43 ff 74 c0 43 ff e0 de 83 d1 20 de 84 8e 54 00000020: de 83 d7 c4 [ 2.082138] alg: aead: Test 1 failed on encryption for authenc-hmac-sha1-cbc-aes-talitos [ 2.090435] 00000000: 53 69 6e 67 6c 65 20 62 6c 6f 63 6b 20 6d 73 67 00000010: de 84 ea 58 c0 93 1a 24 de 84 e8 59 de 84 f1 20 00000020: 00 00 00 00 [ 2.105721] alg: aead: Test 1 failed on encryption for authenc-hmac-sha1-cbc-3des-talitos [ 2.114259] 00000000: 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 73 74 00000010: 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 74 65 00000020: 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 00000030: 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 00000040: 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 00000050: 65 72 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 00000060: 72 63 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 00000070: 63 65 65 72 73 74 54 20 6f 6f 4d 20 6e 61 0a 79 00000080: c0 50 f1 ac c0 50 f3 38 c0 50 f3 94 c0 50 f5 30 00000090: c0 99 74 3c [ 2.166410] alg: aead: Test 1 failed on encryption for authenc-hmac-sha1-cbc-3des-talitos [ 2.174794] 00000000: 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 73 74 00000010: 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 74 65 00000020: 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 00000030: 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 00000040: 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 00000050: 65 72 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 00000060: 72 63 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 00000070: 63 65 65 72 73 74 54 20 6f 6f 4d 20 6e 61 0a 79 00000080: c0 50 f1 ac c0 50 f3 38 c0 50 f3 94 c0 50 f5 30 00000090: c0 99 74 3c [ 2.226486] alg: No test for authenc(hmac(sha224),cbc(aes)) (authenc-hmac-sha224-cbc-aes-talitos) [ 2.236459] alg: No test for authenc(hmac(sha224),cbc(aes)) (authenc-hmac-sha224-cbc-aes-talitos) [ 2.247196] alg: aead: Test 1 failed on encryption for authenc-hmac-sha224-cbc-3des-talitos [ 2.255555] 00000000: 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 73 74 00000010: 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 74 65 00000020: 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 00000030: 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 00000040: 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 00000050: 65 72 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 00000060: 72 63 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 00000070: 63 65 65 72 73 74 54 20 6f 6f 4d 20 6e 61 0a 79 00000080: c0 50 f1 ac c0 50 f3 38 c0 50 f3 94 c0 50 f5 30 00000090: c0 99 74 3c c0 96 e5 b8 [ 2.309004] alg: aead: Test 1 failed on encryption for authenc-hmac-sha224-cbc-3des-talitos [ 2.317562] 00000000: 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 73 74 00000010: 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 74 65 00000020: 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 00000030: 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 00000040: 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 00000050: 65 72 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 00000060: 72 63 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 00000070: 63 65 65 72 73 74 54 20 6f 6f 4d 20 6e 61 0a 79 00000080: c0 50 f1 ac c0 50 f3 38 c0 50 f3 94 c0 50 f5 30 00000090: c0 99 74 3c c0 96 e5 b8 [ 2.370710] alg: aead: Test 1 failed on encryption for authenc-hmac-sha256-cbc-aes-talitos [ 2.379177] 00000000: 53 69 6e 67 6c 65 20 62 6c 6f 63 6b 20 6d 73 67 00000010: 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 74 65 00000020: 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 [ 2.397863] alg: aead: Test 1 failed on encryption for authenc-hmac-sha256-cbc-aes-talitos [ 2.406134] 00000000: 53 69 6e 67 6c 65 20 62 6c 6f 63 6b 20 6d 73 67 00000010: 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 74 65 00000020: 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 [ 2.424789] alg: aead: Test 1 failed on encryption for authenc-hmac-sha256-cbc-3des-talitos [ 2.433491] 00000000: 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 73 74 00000010: 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 74 65 00000020: 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 00000030: 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 00000040: 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 00000050: 65 72 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 00000060: 72 63 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 00000070: 63 65 65 72 73 74 54 20 6f 6f 4d 20 6e 61 0a 79 00000080: c0 50 f1 ac c0 50 f3 38 c0 50 f3 94 c0 50 f5 30 00000090: c0 99 74 3c c0 96 e5 b8 c0 96 e9 20 c0 00 3d dc [ 2.488832] alg: aead: Test 1 failed on encryption for authenc-hmac-sha256-cbc-3des-talitos [ 2.497387] 00000000: 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 73 74 00000010: 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 74 65 00000020: 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 65 72 00000030: 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 72 63 00000040: 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 63 65 00000050: 65 72 73 74 54 20 6f 6f 4d 20 6e 61 20 79 65 53 00000060: 72 63 74 65 20 73 6f 54 20 6f 61 4d 79 6e 53 20 00000070: 63 65 65 72 73 74 54 20 6f 6f 4d 20 6e 61 0a 79 00000080: c0 50 f1 ac c0 50 f3 38 c0 50 f3 94 c0 50 f5 30 00000090: c0 99 74 3c c0 96 e5 b8 c0 96 e9 20 c0 00 3d dc This patch fixes that. Signed-off-by: Christophe Leroy Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/talitos.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index dff88838dce7..cd8a37e60259 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -1232,12 +1232,11 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq, sg_link_tbl_len += authsize; } - sg_count = talitos_sg_map(dev, areq->src, cryptlen, edesc, - &desc->ptr[4], sg_count, areq->assoclen, - tbl_off); + ret = talitos_sg_map(dev, areq->src, cryptlen, edesc, &desc->ptr[4], + sg_count, areq->assoclen, tbl_off); - if (sg_count > 1) { - tbl_off += sg_count; + if (ret > 1) { + tbl_off += ret; sync_needed = true; } From 68b426477820e396f204e06c6de4063a82d63422 Mon Sep 17 00:00:00 2001 From: LEROY Christophe Date: Fri, 6 Oct 2017 15:04:35 +0200 Subject: [PATCH 0268/1013] crypto: talitos - fix memory corruption on SEC2 commit e04a61bebc5da1535b6f194b464295b8d558e2fc upstream. On SEC2, when using the old descriptors type (hmac snoop no afeu) for doing IPsec, the CICV out pointeur points out of the allocated memory. [ 2.502554] ============================================================================= [ 2.510740] BUG dma-kmalloc-256 (Not tainted): Redzone overwritten [ 2.516907] ----------------------------------------------------------------------------- [ 2.516907] [ 2.526535] Disabling lock debugging due to kernel taint [ 2.531845] INFO: 0xde858108-0xde85810b. First byte 0xf8 instead of 0xcc [ 2.538549] INFO: Allocated in 0x806181a9 age=0 cpu=0 pid=58 [ 2.544229] __kmalloc+0x374/0x564 [ 2.547649] talitos_edesc_alloc+0x17c/0x48c [ 2.551929] aead_edesc_alloc+0x80/0x154 [ 2.555863] aead_encrypt+0x30/0xe0 [ 2.559368] __test_aead+0x5a0/0x1f3c [ 2.563042] test_aead+0x2c/0x110 [ 2.566371] alg_test_aead+0x5c/0xf4 [ 2.569958] alg_test+0x1dc/0x5a0 [ 2.573305] cryptomgr_test+0x50/0x70 [ 2.576984] kthread+0xd8/0x134 [ 2.580155] ret_from_kernel_thread+0x5c/0x64 [ 2.584534] INFO: Freed in ipsec_esp_encrypt_done+0x130/0x240 age=6 cpu=0 pid=0 [ 2.591839] ipsec_esp_encrypt_done+0x130/0x240 [ 2.596395] flush_channel+0x1dc/0x488 [ 2.600161] talitos2_done_4ch+0x30/0x200 [ 2.604185] tasklet_action+0xa0/0x13c [ 2.607948] __do_softirq+0x148/0x6cc [ 2.611623] irq_exit+0xc0/0x124 [ 2.614869] call_do_irq+0x24/0x3c [ 2.618292] do_IRQ+0x78/0x108 [ 2.621369] ret_from_except+0x0/0x14 [ 2.625055] finish_task_switch+0x58/0x350 [ 2.629165] schedule+0x80/0x134 [ 2.632409] schedule_preempt_disabled+0x38/0xc8 [ 2.637042] cpu_startup_entry+0xe4/0x190 [ 2.641074] start_kernel+0x3f4/0x408 [ 2.644741] 0x3438 [ 2.646857] INFO: Slab 0xdffbdb00 objects=9 used=1 fp=0xde8581c0 flags=0x0080 [ 2.653978] INFO: Object 0xde858008 @offset=8 fp=0xca4395df [ 2.653978] [ 2.661032] Redzone de858000: cc cc cc cc cc cc cc cc ........ [ 2.669029] Object de858008: 00 00 00 02 00 00 00 02 00 6b 6b 6b 1e 83 ea 28 .........kkk...( [ 2.677628] Object de858018: 00 00 00 70 1e 85 80 64 ff 73 1d 21 6b 6b 6b 6b ...p...d.s.!kkkk [ 2.686228] Object de858028: 00 20 00 00 1e 84 17 24 00 10 00 00 1e 85 70 00 . .....$......p. [ 2.694829] Object de858038: 00 18 00 00 1e 84 17 44 00 08 00 00 1e 83 ea 28 .......D.......( [ 2.703430] Object de858048: 00 80 00 00 1e 84 f0 00 00 80 00 00 1e 85 70 10 ..............p. [ 2.712030] Object de858058: 00 20 6b 00 1e 85 80 f4 6b 6b 6b 6b 00 80 02 00 . k.....kkkk.... [ 2.720629] Object de858068: 1e 84 f0 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ....kkkkkkkkkkkk [ 2.729230] Object de858078: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2.737830] Object de858088: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2.746429] Object de858098: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2.755029] Object de8580a8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2.763628] Object de8580b8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2.772229] Object de8580c8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2.780829] Object de8580d8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk [ 2.789430] Object de8580e8: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 73 b0 ea 9f kkkkkkkkkkkks... [ 2.798030] Object de8580f8: e8 18 80 d6 56 38 44 c0 db e3 4f 71 f7 ce d1 d3 ....V8D...Oq.... [ 2.806629] Redzone de858108: f8 bd 3e 4f ..>O [ 2.814279] Padding de8581b0: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ [ 2.822283] CPU: 0 PID: 0 Comm: swapper Tainted: G B 4.9.50-g995be12679 #179 [ 2.831819] Call Trace: [ 2.834301] [dffefd20] [c01aa9a8] check_bytes_and_report+0x100/0x194 (unreliable) [ 2.841801] [dffefd50] [c01aac3c] check_object+0x200/0x530 [ 2.847306] [dffefd80] [c01ae584] free_debug_processing+0x290/0x690 [ 2.853585] [dffefde0] [c01aec8c] __slab_free+0x308/0x628 [ 2.859000] [dffefe80] [c05057f4] ipsec_esp_encrypt_done+0x130/0x240 [ 2.865378] [dffefeb0] [c05002c4] flush_channel+0x1dc/0x488 [ 2.870968] [dffeff10] [c05007a8] talitos2_done_4ch+0x30/0x200 [ 2.876814] [dffeff30] [c002fe38] tasklet_action+0xa0/0x13c [ 2.882399] [dffeff60] [c002f118] __do_softirq+0x148/0x6cc [ 2.887896] [dffeffd0] [c002f954] irq_exit+0xc0/0x124 [ 2.892968] [dffefff0] [c0013adc] call_do_irq+0x24/0x3c [ 2.898213] [c0d4be00] [c000757c] do_IRQ+0x78/0x108 [ 2.903113] [c0d4be30] [c0015c08] ret_from_except+0x0/0x14 [ 2.908634] --- interrupt: 501 at finish_task_switch+0x70/0x350 [ 2.908634] LR = finish_task_switch+0x58/0x350 [ 2.919327] [c0d4bf20] [c085e1d4] schedule+0x80/0x134 [ 2.924398] [c0d4bf50] [c085e2c0] schedule_preempt_disabled+0x38/0xc8 [ 2.930853] [c0d4bf60] [c007f064] cpu_startup_entry+0xe4/0x190 [ 2.936707] [c0d4bfb0] [c096c434] start_kernel+0x3f4/0x408 [ 2.942198] [c0d4bff0] [00003438] 0x3438 [ 2.946137] FIX dma-kmalloc-256: Restoring 0xde858108-0xde85810b=0xcc [ 2.946137] [ 2.954158] FIX dma-kmalloc-256: Object at 0xde858008 not freed This patch reworks the handling of the CICV out in order to properly handle all cases. Signed-off-by: Christophe Leroy Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/talitos.c | 42 ++++++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 14 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index cd8a37e60259..1e799886c57d 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -1247,14 +1247,15 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq, dma_map_sg(dev, areq->dst, sg_count, DMA_FROM_DEVICE); } - sg_count = talitos_sg_map(dev, areq->dst, cryptlen, edesc, - &desc->ptr[5], sg_count, areq->assoclen, - tbl_off); + ret = talitos_sg_map(dev, areq->dst, cryptlen, edesc, &desc->ptr[5], + sg_count, areq->assoclen, tbl_off); if (desc->hdr & DESC_HDR_TYPE_IPSEC_ESP) to_talitos_ptr_ext_or(&desc->ptr[5], authsize, is_sec1); - if (sg_count > 1) { + /* ICV data */ + if (ret > 1) { + tbl_off += ret; edesc->icv_ool = true; sync_needed = true; @@ -1264,9 +1265,7 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq, sizeof(struct talitos_ptr) + authsize; /* Add an entry to the link table for ICV data */ - tbl_ptr += sg_count - 1; - to_talitos_ptr_ext_set(tbl_ptr, 0, is_sec1); - tbl_ptr++; + to_talitos_ptr_ext_set(tbl_ptr - 1, 0, is_sec1); to_talitos_ptr_ext_set(tbl_ptr, DESC_PTR_LNKTBL_RETURN, is_sec1); to_talitos_ptr_len(tbl_ptr, authsize, is_sec1); @@ -1274,18 +1273,33 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq, /* icv data follows link tables */ to_talitos_ptr(tbl_ptr, edesc->dma_link_tbl + offset, is_sec1); + } else { + dma_addr_t addr = edesc->dma_link_tbl; + + if (is_sec1) + addr += areq->assoclen + cryptlen; + else + addr += sizeof(struct talitos_ptr) * tbl_off; + + to_talitos_ptr(&desc->ptr[6], addr, is_sec1); + to_talitos_ptr_len(&desc->ptr[6], authsize, is_sec1); + } + } else if (!(desc->hdr & DESC_HDR_TYPE_IPSEC_ESP)) { + ret = talitos_sg_map(dev, areq->dst, authsize, edesc, + &desc->ptr[6], sg_count, areq->assoclen + + cryptlen, + tbl_off); + if (ret > 1) { + tbl_off += ret; + edesc->icv_ool = true; + sync_needed = true; + } else { + edesc->icv_ool = false; } } else { edesc->icv_ool = false; } - /* ICV data */ - if (!(desc->hdr & DESC_HDR_TYPE_IPSEC_ESP)) { - to_talitos_ptr_len(&desc->ptr[6], authsize, is_sec1); - to_talitos_ptr(&desc->ptr[6], edesc->dma_link_tbl + - areq->assoclen + cryptlen, is_sec1); - } - /* iv out */ if (desc->hdr & DESC_HDR_TYPE_IPSEC_ESP) map_single_talitos_ptr(dev, &desc->ptr[6], ivsize, ctx->iv, From 62744ebaeb91ff089865f90d0a0a1475a924809f Mon Sep 17 00:00:00 2001 From: LEROY Christophe Date: Fri, 6 Oct 2017 15:04:37 +0200 Subject: [PATCH 0269/1013] crypto: talitos - fix setkey to check key weakness commit f384cdc4faf350fdb6ad93c5f26952b9ba7c7566 upstream. Crypto manager test report the following failures: [ 3.061081] alg: skcipher: setkey failed on test 5 for ecb-des-talitos: flags=100 [ 3.069342] alg: skcipher-ddst: setkey failed on test 5 for ecb-des-talitos: flags=100 [ 3.077754] alg: skcipher-ddst: setkey failed on test 5 for ecb-des-talitos: flags=100 This is due to setkey being expected to detect weak keys. Signed-off-by: Christophe Leroy Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/talitos.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 1e799886c57d..8aa1212086f4 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -1507,12 +1507,20 @@ static int ablkcipher_setkey(struct crypto_ablkcipher *cipher, const u8 *key, unsigned int keylen) { struct talitos_ctx *ctx = crypto_ablkcipher_ctx(cipher); + u32 tmp[DES_EXPKEY_WORDS]; if (keylen > TALITOS_MAX_KEY_SIZE) { crypto_ablkcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN); return -EINVAL; } + if (unlikely(crypto_ablkcipher_get_flags(cipher) & + CRYPTO_TFM_REQ_WEAK_KEY) && + !des_ekey(tmp, key)) { + crypto_ablkcipher_set_flags(cipher, CRYPTO_TFM_RES_WEAK_KEY); + return -EINVAL; + } + memcpy(&ctx->key, key, keylen); ctx->keylen = keylen; From a2d93ada4fd30feeb3d0c914926f7a9fdea10102 Mon Sep 17 00:00:00 2001 From: LEROY Christophe Date: Fri, 6 Oct 2017 15:04:39 +0200 Subject: [PATCH 0270/1013] crypto: talitos - fix AEAD for sha224 on non sha224 capable chips commit 6cda075aff67a1b9b5ba1b2818091dc939643b6c upstream. sha224 AEAD test fails with: [ 2.803125] talitos ff020000.crypto: DEUISR 0x00000000_00000000 [ 2.808743] talitos ff020000.crypto: MDEUISR 0x80100000_00000000 [ 2.814678] talitos ff020000.crypto: DESCBUF 0x20731f21_00000018 [ 2.820616] talitos ff020000.crypto: DESCBUF 0x0628d64c_00000010 [ 2.826554] talitos ff020000.crypto: DESCBUF 0x0631005c_00000018 [ 2.832492] talitos ff020000.crypto: DESCBUF 0x0628d664_00000008 [ 2.838430] talitos ff020000.crypto: DESCBUF 0x061b13a0_00000080 [ 2.844369] talitos ff020000.crypto: DESCBUF 0x0631006c_00000080 [ 2.850307] talitos ff020000.crypto: DESCBUF 0x0631006c_00000018 [ 2.856245] talitos ff020000.crypto: DESCBUF 0x063100ec_00000000 [ 2.884972] talitos ff020000.crypto: failed to reset channel 0 [ 2.890503] talitos ff020000.crypto: done overflow, internal time out, or rngu error: ISR 0x20000000_00020000 [ 2.900652] alg: aead: encryption failed on test 1 for authenc-hmac-sha224-cbc-3des-talitos: ret=22 This is due to SHA224 not being supported by the HW. Allthough for hash we are able to init the hash context by SW, it is not possible for AEAD. Therefore SHA224 AEAD has to be deactivated. Signed-off-by: Christophe Leroy Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/talitos.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index 8aa1212086f4..b7184f305867 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -3068,6 +3068,11 @@ static struct talitos_crypto_alg *talitos_alg_alloc(struct device *dev, t_alg->algt.alg.aead.setkey = aead_setkey; t_alg->algt.alg.aead.encrypt = aead_encrypt; t_alg->algt.alg.aead.decrypt = aead_decrypt; + if (!(priv->features & TALITOS_FTR_SHA224_HWINIT) && + !strncmp(alg->cra_name, "authenc(hmac(sha224)", 20)) { + kfree(t_alg); + return ERR_PTR(-ENOTSUPP); + } break; case CRYPTO_ALG_TYPE_AHASH: alg = &t_alg->algt.alg.hash.halg.base; From 2040f8e814055a330aae1ed83c3fe2aba26173f0 Mon Sep 17 00:00:00 2001 From: LEROY Christophe Date: Fri, 6 Oct 2017 15:04:41 +0200 Subject: [PATCH 0271/1013] crypto: talitos - fix use of sg_link_tbl_len commit fbb22137c4d9bab536958b152d096fb3f98020ea upstream. sg_link_tbl_len shall be used instead of cryptlen, otherwise SECs which perform HW CICV verification will fail. Signed-off-by: Christophe Leroy Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/talitos.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index b7184f305867..cf5c9701b898 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -1232,8 +1232,8 @@ static int ipsec_esp(struct talitos_edesc *edesc, struct aead_request *areq, sg_link_tbl_len += authsize; } - ret = talitos_sg_map(dev, areq->src, cryptlen, edesc, &desc->ptr[4], - sg_count, areq->assoclen, tbl_off); + ret = talitos_sg_map(dev, areq->src, sg_link_tbl_len, edesc, + &desc->ptr[4], sg_count, areq->assoclen, tbl_off); if (ret > 1) { tbl_off += ret; From 077415efefe77f6f3e82ae4868867e32ede01286 Mon Sep 17 00:00:00 2001 From: LEROY Christophe Date: Fri, 6 Oct 2017 15:04:43 +0200 Subject: [PATCH 0272/1013] crypto: talitos - fix ctr-aes-talitos commit 70d355ccea899dad47dc22d3a4406998f55143fd upstream. ctr-aes-talitos test fails as follows on SEC2 [ 0.837427] alg: skcipher: Test 1 failed (invalid result) on encryption for ctr-aes-talitos [ 0.845763] 00000000: 16 36 d5 ee 34 f8 06 25 d7 7f 8e 56 ca 88 43 45 [ 0.852345] 00000010: f9 3f f7 17 2a b2 12 23 30 43 09 15 82 dd e1 97 [ 0.858940] 00000020: a7 f7 32 b5 eb 25 06 13 9a ec f5 29 25 f8 4d 66 [ 0.865366] 00000030: b0 03 5b 8e aa 9a 42 b6 19 33 8a e2 9d 65 96 95 This patch fixes the descriptor type which is special for CTR AES Fixes: 5e75ae1b3cef6 ("crypto: talitos - add new crypto modes") Signed-off-by: Christophe Leroy Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/talitos.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c index cf5c9701b898..a19b5d0300a9 100644 --- a/drivers/crypto/talitos.c +++ b/drivers/crypto/talitos.c @@ -2635,7 +2635,7 @@ static struct talitos_alg_template driver_algs[] = { .ivsize = AES_BLOCK_SIZE, } }, - .desc_hdr_template = DESC_HDR_TYPE_COMMON_NONSNOOP_NO_AFEU | + .desc_hdr_template = DESC_HDR_TYPE_AESU_CTR_NONSNOOP | DESC_HDR_SEL0_AESU | DESC_HDR_MODE0_AESU_CTR, }, From 21e1e6192ba653dfcc42bb0b275a4af7af5e32b0 Mon Sep 17 00:00:00 2001 From: Russell King Date: Fri, 24 Nov 2017 23:49:34 +0000 Subject: [PATCH 0273/1013] ARM: BUG if jumping to usermode address in kernel mode commit 8bafae202c82dc257f649ea3c275a0f35ee15113 upstream. Detect if we are returning to usermode via the normal kernel exit paths but the saved PSR value indicates that we are in kernel mode. This could occur due to corrupted stack state, which has been observed with "ftracetest". This ensures that we catch the problem case before we get to user code. Signed-off-by: Russell King Cc: Alex Shi Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/assembler.h | 18 ++++++++++++++++++ arch/arm/kernel/entry-header.S | 6 ++++++ 2 files changed, 24 insertions(+) diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index ad301f107dd2..bc8d4bbd82e2 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -518,4 +518,22 @@ THUMB( orr \reg , \reg , #PSR_T_BIT ) #endif .endm + .macro bug, msg, line +#ifdef CONFIG_THUMB2_KERNEL +1: .inst 0xde02 +#else +1: .inst 0xe7f001f2 +#endif +#ifdef CONFIG_DEBUG_BUGVERBOSE + .pushsection .rodata.str, "aMS", %progbits, 1 +2: .asciz "\msg" + .popsection + .pushsection __bug_table, "aw" + .align 2 + .word 1b, 2b + .hword \line + .popsection +#endif + .endm + #endif /* __ASM_ASSEMBLER_H__ */ diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index d523cd8439a3..7f4d80c2db6b 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -300,6 +300,8 @@ mov r2, sp ldr r1, [r2, #\offset + S_PSR] @ get calling cpsr ldr lr, [r2, #\offset + S_PC]! @ get pc + tst r1, #0xcf + bne 1f msr spsr_cxsf, r1 @ save in spsr_svc #if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_32v6K) @ We must avoid clrex due to Cortex-A15 erratum #830321 @@ -314,6 +316,7 @@ @ after ldm {}^ add sp, sp, #\offset + PT_REGS_SIZE movs pc, lr @ return & move spsr_svc into cpsr +1: bug "Returning to usermode but unexpected PSR bits set?", \@ #elif defined(CONFIG_CPU_V7M) @ V7M restore. @ Note that we don't need to do clrex here as clearing the local @@ -329,6 +332,8 @@ ldr r1, [sp, #\offset + S_PSR] @ get calling cpsr ldr lr, [sp, #\offset + S_PC] @ get pc add sp, sp, #\offset + S_SP + tst r1, #0xcf + bne 1f msr spsr_cxsf, r1 @ save in spsr_svc @ We must avoid clrex due to Cortex-A15 erratum #830321 @@ -341,6 +346,7 @@ .endif add sp, sp, #PT_REGS_SIZE - S_SP movs pc, lr @ return & move spsr_svc into cpsr +1: bug "Returning to usermode but unexpected PSR bits set?", \@ #endif /* !CONFIG_THUMB2_KERNEL */ .endm From 14f13c9d58d8d738666d58fa7b2f8872286655bf Mon Sep 17 00:00:00 2001 From: Russell King Date: Mon, 27 Nov 2017 11:22:42 +0000 Subject: [PATCH 0274/1013] ARM: avoid faulting on qemu commit 3aaf33bebda8d4ffcc0fc8ef39e6c1ac68823b11 upstream. When qemu starts a kernel in a bare environment, the default SCR has the AW and FW bits clear, which means that the kernel can't modify the PSR A or PSR F bits, and means that FIQs and imprecise aborts are always masked. When running uboot under qemu, the AW and FW SCR bits are set, and the kernel functions normally - and this is how real hardware behaves. Fix this for qemu by ignoring the FIQ bit. Fixes: 8bafae202c82 ("ARM: BUG if jumping to usermode address in kernel mode") Signed-off-by: Russell King Cc: Alex Shi Signed-off-by: Greg Kroah-Hartman --- arch/arm/kernel/entry-header.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/arm/kernel/entry-header.S b/arch/arm/kernel/entry-header.S index 7f4d80c2db6b..0f07579af472 100644 --- a/arch/arm/kernel/entry-header.S +++ b/arch/arm/kernel/entry-header.S @@ -300,7 +300,7 @@ mov r2, sp ldr r1, [r2, #\offset + S_PSR] @ get calling cpsr ldr lr, [r2, #\offset + S_PC]! @ get pc - tst r1, #0xcf + tst r1, #PSR_I_BIT | 0x0f bne 1f msr spsr_cxsf, r1 @ save in spsr_svc #if defined(CONFIG_CPU_V6) || defined(CONFIG_CPU_32v6K) @@ -332,7 +332,7 @@ ldr r1, [sp, #\offset + S_PSR] @ get calling cpsr ldr lr, [sp, #\offset + S_PC] @ get pc add sp, sp, #\offset + S_SP - tst r1, #0xcf + tst r1, #PSR_I_BIT | 0x0f bne 1f msr spsr_cxsf, r1 @ save in spsr_svc From f0d564230e08d314aac54cbcb1865a4b8fafe906 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Fri, 17 Nov 2017 18:35:53 +0000 Subject: [PATCH 0275/1013] irqchip/qcom: Fix u32 comparison with value less than zero [ Upstream commit e9990d70e8a063a7b894c5cbb99f630a0f41200d ] The comparison of u32 nregs being less than zero is never true since nregs is unsigned. Fix this by making nregs a signed integer. Fixes: f20cc9b00c7b ("irqchip/qcom: Add IRQ combiner driver") Signed-off-by: Colin Ian King Signed-off-by: Thomas Gleixner Cc: Marc Zyngier Cc: kernel-janitors@vger.kernel.org Cc: Jason Cooper Link: https://lkml.kernel.org/r/20171117183553.2739-1-colin.king@canonical.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/irqchip/qcom-irq-combiner.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/irqchip/qcom-irq-combiner.c b/drivers/irqchip/qcom-irq-combiner.c index 6aa3ea479214..f31265937439 100644 --- a/drivers/irqchip/qcom-irq-combiner.c +++ b/drivers/irqchip/qcom-irq-combiner.c @@ -238,7 +238,7 @@ static int __init combiner_probe(struct platform_device *pdev) { struct combiner *combiner; size_t alloc_sz; - u32 nregs; + int nregs; int err; nregs = count_registers(pdev); From 4d0d1bc65b5710f45bc5fe10b4fbfc2671b0851d Mon Sep 17 00:00:00 2001 From: Ursula Braun Date: Tue, 21 Nov 2017 13:23:53 +0100 Subject: [PATCH 0276/1013] net/smc: use sk_rcvbuf as start for rmb creation [ Upstream commit 4e1061f4a2bba1669c7297455c73ddafbebf2b12 ] Commit 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers") merged handling of SMC receive and send buffers. It introduced sk_buf_size as merged start value for size determination. But since sk_buf_size is not used at all, sk_sndbuf is erroneously used as start for rmb creation. This patch makes sure, sk_buf_size is really used as intended, and sk_rcvbuf is used as start value for rmb creation. Fixes: 3e034725c0d8 ("net/smc: common functions for RMBs and send buffers") Signed-off-by: Ursula Braun Reviewed-by: Hans Wippel Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/smc/smc_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c index 413e3868fbf3..7166e7ecbe86 100644 --- a/net/smc/smc_core.c +++ b/net/smc/smc_core.c @@ -571,7 +571,7 @@ static int __smc_buf_create(struct smc_sock *smc, bool is_rmb) /* use socket send buffer size (w/o overhead) as start value */ sk_buf_size = smc->sk.sk_sndbuf / 2; - for (bufsize_short = smc_compress_bufsize(smc->sk.sk_sndbuf / 2); + for (bufsize_short = smc_compress_bufsize(sk_buf_size); bufsize_short >= 0; bufsize_short--) { if (is_rmb) { From 54a13eb7f03ac8bb92b3b2c358e42eb08834240b Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Wed, 15 Nov 2017 18:17:07 +0900 Subject: [PATCH 0277/1013] kbuild: pkg: use --transform option to prefix paths in tar [ Upstream commit 2dbc644ac62bbcb9ee78e84719953f611be0413d ] For rpm-pkg and deb-pkg, a source tar file is created. All paths in the archive must be prefixed with the base name of the tar so that everything is contained in the directory when you extract it. Currently, scripts/package/Makefile uses a symlink for that, and removes it after the tar is created. If you terminate the build during the tar creation, the symlink is left over. Then, at the next package build, you will see a warning like follows: ln: '.' and 'kernel-4.14.0+/.' are the same file It is possible to fix it by adding -n (--no-dereference) option to the "ln" command, but a cleaner way is to use --transform option of "tar" command. This option is GNU extension, but it should not hurt to use it in the Linux build system. The 'S' flag is needed to exclude symlinks from the path fixup. Without it, symlinks in the kernel are broken. Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- scripts/package/Makefile | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/scripts/package/Makefile b/scripts/package/Makefile index 73f9f3192b9f..2b4aa4c19b21 100644 --- a/scripts/package/Makefile +++ b/scripts/package/Makefile @@ -39,10 +39,9 @@ if test "$(objtree)" != "$(srctree)"; then \ false; \ fi ; \ $(srctree)/scripts/setlocalversion --save-scmversion; \ -ln -sf $(srctree) $(2); \ tar -cz $(RCS_TAR_IGNORE) -f $(2).tar.gz \ - $(addprefix $(2)/,$(TAR_CONTENT) $(3)); \ -rm -f $(2) $(objtree)/.scmversion + --transform 's:^:$(2)/:S' $(TAR_CONTENT) $(3); \ +rm -f $(objtree)/.scmversion # rpm-pkg # --------------------------------------------------------------------------- From 394d0c93b9dd2dee2024a9d922a60b3a3aadb538 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Tue, 14 Nov 2017 20:38:07 +0900 Subject: [PATCH 0278/1013] coccinelle: fix parallel build with CHECK=scripts/coccicheck [ Upstream commit d7059ca0147adcd495f3c5b41f260e1ac55bb679 ] The command "make -j8 C=1 CHECK=scripts/coccicheck" produces lots of "coccicheck failed" error messages. Julia Lawall explained the Coccinelle behavior as follows: "The problem on the Coccinelle side is that it uses a subdirectory with the name of the semantic patch to store standard output and standard error for the different threads. I didn't want to use a name with the pid, so that one could easily find this information while Coccinelle is running. Normally the subdirectory is cleaned up when Coccinelle completes, so there is only one of them at a time. Maybe it is best to just add the pid. There is the risk that these subdirectories will accumulate if Coccinelle crashes in a way such that they don't get cleaned up, but Coccinelle could print a warning if it detects this case, rather than failing." When scripts/coccicheck is used as CHECK tool and -j option is given to Make, the whole of build process runs in parallel. So, multiple processes try to get access to the same subdirectory. I notice spatch creates the subdirectory only when it runs in parallel (i.e. --jobs is given and is greater than 1). Setting NPROC=1 is a reasonable solution; spatch does not create the subdirectory. Besides, ONLINE=1 mode takes a single file input for each spatch invocation, so there is no reason to parallelize it in the first place. Signed-off-by: Masahiro Yamada Acked-by: Julia Lawall Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- scripts/coccicheck | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/scripts/coccicheck b/scripts/coccicheck index 28ad1feff9e1..dda283aba96b 100755 --- a/scripts/coccicheck +++ b/scripts/coccicheck @@ -30,12 +30,6 @@ else VERBOSE=0 fi -if [ -z "$J" ]; then - NPROC=$(getconf _NPROCESSORS_ONLN) -else - NPROC="$J" -fi - FLAGS="--very-quiet" # You can use SPFLAGS to append extra arguments to coccicheck or override any @@ -70,6 +64,9 @@ if [ "$C" = "1" -o "$C" = "2" ]; then # Take only the last argument, which is the C file to test shift $(( $# - 1 )) OPTIONS="$COCCIINCLUDE $1" + + # No need to parallelize Coccinelle since this mode takes one input file. + NPROC=1 else ONLINE=0 if [ "$KBUILD_EXTMOD" = "" ] ; then @@ -77,6 +74,12 @@ else else OPTIONS="--dir $KBUILD_EXTMOD $COCCIINCLUDE" fi + + if [ -z "$J" ]; then + NPROC=$(getconf _NPROCESSORS_ONLN) + else + NPROC="$J" + fi fi if [ "$KBUILD_EXTMOD" != "" ] ; then From 86d1d015fe06267b60d496971d01c859aaa8ce08 Mon Sep 17 00:00:00 2001 From: Madhavan Srinivasan Date: Wed, 22 Nov 2017 10:45:38 +0530 Subject: [PATCH 0279/1013] powerpc/perf: Fix pmu_count to count only nest imc pmus [ Upstream commit de34787f1096cce38e2590be0013b44418d14546 ] "pmu_count" in opal_imc_counters_probe() is intended to hold the number of successful nest imc pmu registerations. But current code also counts other imc units like core_imc and thread_imc. Patch add a check to count only nest imc pmus. Signed-off-by: Madhavan Srinivasan Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powernv/opal-imc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c index 21f6531fae20..b150f4deaccf 100644 --- a/arch/powerpc/platforms/powernv/opal-imc.c +++ b/arch/powerpc/platforms/powernv/opal-imc.c @@ -191,8 +191,10 @@ static int opal_imc_counters_probe(struct platform_device *pdev) break; } - if (!imc_pmu_create(imc_dev, pmu_count, domain)) - pmu_count++; + if (!imc_pmu_create(imc_dev, pmu_count, domain)) { + if (domain == IMC_DOMAIN_NEST) + pmu_count++; + } } return 0; From 897088926c3477ef8d3d299473a2777dd9d1c9cb Mon Sep 17 00:00:00 2001 From: John Johansen Date: Wed, 15 Nov 2017 15:25:30 -0800 Subject: [PATCH 0280/1013] apparmor: fix leak of null profile name if profile allocation fails [ Upstream commit 4633307e5ed6128975595df43f796a10c41d11c1 ] Fixes: d07881d2edb0 ("apparmor: move new_null_profile to after profile lookup fns()") Reported-by: Seth Arnold Signed-off-by: John Johansen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- security/apparmor/policy.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/security/apparmor/policy.c b/security/apparmor/policy.c index 4243b0c3f0e4..586b249d3b46 100644 --- a/security/apparmor/policy.c +++ b/security/apparmor/policy.c @@ -502,7 +502,7 @@ struct aa_profile *aa_new_null_profile(struct aa_profile *parent, bool hat, { struct aa_profile *p, *profile; const char *bname; - char *name; + char *name = NULL; AA_BUG(!parent); @@ -562,6 +562,7 @@ struct aa_profile *aa_new_null_profile(struct aa_profile *parent, bool hat, return profile; fail: + kfree(name); aa_free_profile(profile); return NULL; } From ad7acca17e31d14ceca8824728440e727150ddd0 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Fri, 10 Nov 2017 16:12:29 -0800 Subject: [PATCH 0281/1013] x86/mpx/selftests: Fix up weird arrays [ Upstream commit a6400120d042397675fcf694060779d21e9e762d ] The MPX hardware data structurse are defined in a weird way: they define their size in bytes and then union that with the type with which we want to access them. Yes, this is weird, but it does work. But, new GCC's complain that we are accessing the array out of bounds. Just make it a zero-sized array so gcc will stop complaining. There was not really a bug here. Signed-off-by: Dave Hansen Acked-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171111001229.58A7933D@viggo.jf.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/x86/mpx-hw.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/x86/mpx-hw.h b/tools/testing/selftests/x86/mpx-hw.h index 3f0093911f03..d1b61ab870f8 100644 --- a/tools/testing/selftests/x86/mpx-hw.h +++ b/tools/testing/selftests/x86/mpx-hw.h @@ -52,14 +52,14 @@ struct mpx_bd_entry { union { char x[MPX_BOUNDS_DIR_ENTRY_SIZE_BYTES]; - void *contents[1]; + void *contents[0]; }; } __attribute__((packed)); struct mpx_bt_entry { union { char x[MPX_BOUNDS_TABLE_ENTRY_SIZE_BYTES]; - unsigned long contents[1]; + unsigned long contents[0]; }; } __attribute__((packed)); From 266fd76296bea9f392ae205e7a08f5f5726dc7f9 Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Fri, 10 Nov 2017 18:48:50 +0000 Subject: [PATCH 0282/1013] mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl() [ Upstream commit 67bd52386125ce1159c0581cbcd2740addf33cd4 ] hwsim_new_radio_nl() now copies the name attribute in order to add a null-terminator. mac80211_hwsim_new_radio() (indirectly) copies it again into the net_device structure, so the first copy is not used or freed later. Free the first copy before returning. Fixes: ff4dd73dd2b4 ("mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length") Signed-off-by: Ben Hutchings Signed-off-by: Johannes Berg Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/mac80211_hwsim.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/mac80211_hwsim.c b/drivers/net/wireless/mac80211_hwsim.c index 6467ffac9811..a59b54328c07 100644 --- a/drivers/net/wireless/mac80211_hwsim.c +++ b/drivers/net/wireless/mac80211_hwsim.c @@ -3108,6 +3108,7 @@ static int hwsim_new_radio_nl(struct sk_buff *msg, struct genl_info *info) { struct hwsim_new_radio_params param = { 0 }; const char *hwname = NULL; + int ret; param.reg_strict = info->attrs[HWSIM_ATTR_REG_STRICT_REG]; param.p2p_device = info->attrs[HWSIM_ATTR_SUPPORT_P2P_DEVICE]; @@ -3147,7 +3148,9 @@ static int hwsim_new_radio_nl(struct sk_buff *msg, struct genl_info *info) param.regd = hwsim_world_regdom_custom[idx]; } - return mac80211_hwsim_new_radio(info, ¶m); + ret = mac80211_hwsim_new_radio(info, ¶m); + kfree(hwname); + return ret; } static int hwsim_del_radio_nl(struct sk_buff *msg, struct genl_info *info) From ffe6293d19dc965916177986aec254d0cb9cd71e Mon Sep 17 00:00:00 2001 From: Alexey Kodanev Date: Fri, 17 Nov 2017 19:16:17 +0300 Subject: [PATCH 0283/1013] gre6: use log_ecn_error module parameter in ip6_tnl_rcv() [ Upstream commit 981542c526ecd846920bc500e9989da906ee9fb9 ] After commit 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions") it's not used anywhere in the module, but previously was used in ip6gre_rcv(). Fixes: 308edfdf1563 ("gre6: Cleanup GREv6 receive path, call common GRE functions") Signed-off-by: Alexey Kodanev Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_gre.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 59c121b932ac..5d6bee070871 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -461,7 +461,7 @@ static int ip6gre_rcv(struct sk_buff *skb, const struct tnl_ptk_info *tpi) &ipv6h->saddr, &ipv6h->daddr, tpi->key, tpi->proto); if (tunnel) { - ip6_tnl_rcv(tunnel, skb, tpi, NULL, false); + ip6_tnl_rcv(tunnel, skb, tpi, NULL, log_ecn_error); return PACKET_RCVD; } From 407de7d97a689d478e1c4b4fe2198af9c8cc3fcb Mon Sep 17 00:00:00 2001 From: Xin Long Date: Fri, 17 Nov 2017 14:27:18 +0800 Subject: [PATCH 0284/1013] route: also update fnhe_genid when updating a route cache [ Upstream commit cebe84c6190d741045a322f5343f717139993c08 ] Now when ip route flush cache and it turn out all fnhe_genid != genid. If a redirect/pmtu icmp packet comes and the old fnhe is found and all it's members but fnhe_genid will be updated. Then next time when it looks up route and tries to rebind this fnhe to the new dst, the fnhe will be flushed due to fnhe_genid != genid. It causes this redirect/pmtu icmp packet acutally not to be applied. This patch is to also reset fnhe_genid when updating a route cache. Fixes: 5aad1de5ea2c ("ipv4: use separate genid for next hop exceptions") Acked-by: Hannes Frederic Sowa Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/route.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 3d9f1c2f81c5..aa659433a973 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -651,9 +651,12 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw, struct fnhe_hash_bucket *hash; struct fib_nh_exception *fnhe; struct rtable *rt; + u32 genid, hval; unsigned int i; int depth; - u32 hval = fnhe_hashfun(daddr); + + genid = fnhe_genid(dev_net(nh->nh_dev)); + hval = fnhe_hashfun(daddr); spin_lock_bh(&fnhe_lock); @@ -676,6 +679,8 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw, } if (fnhe) { + if (fnhe->fnhe_genid != genid) + fnhe->fnhe_genid = genid; if (gw) fnhe->fnhe_gw = gw; if (pmtu) { @@ -700,7 +705,7 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw, fnhe->fnhe_next = hash->chain; rcu_assign_pointer(hash->chain, fnhe); } - fnhe->fnhe_genid = fnhe_genid(dev_net(nh->nh_dev)); + fnhe->fnhe_genid = genid; fnhe->fnhe_daddr = daddr; fnhe->fnhe_gw = gw; fnhe->fnhe_pmtu = pmtu; From ad199b18a9c00729bd000cf3565f6a45ebd4e177 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Fri, 17 Nov 2017 14:27:06 +0800 Subject: [PATCH 0285/1013] route: update fnhe_expires for redirect when the fnhe exists [ Upstream commit e39d5246111399dbc6e11cd39fd8580191b86c47 ] Now when creating fnhe for redirect, it sets fnhe_expires for this new route cache. But when updating the exist one, it doesn't do it. It will cause this fnhe never to be expired. Paolo already noticed it before, in Jianlin's test case, it became even worse: When ip route flush cache, the old fnhe is not to be removed, but only clean it's members. When redirect comes again, this fnhe will be found and updated, but never be expired due to fnhe_expires not being set. So fix it by simply updating fnhe_expires even it's for redirect. Fixes: aee06da6726d ("ipv4: use seqlock for nh_exceptions") Reported-by: Jianlin Shi Acked-by: Hannes Frederic Sowa Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/route.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index aa659433a973..647cfc972bde 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -683,10 +683,9 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw, fnhe->fnhe_genid = genid; if (gw) fnhe->fnhe_gw = gw; - if (pmtu) { + if (pmtu) fnhe->fnhe_pmtu = pmtu; - fnhe->fnhe_expires = max(1UL, expires); - } + fnhe->fnhe_expires = max(1UL, expires); /* Update all cached dsts too */ rt = rcu_dereference(fnhe->fnhe_rth_input); if (rt) From f52688ce0d7bb5bd6932be99faec931be59d9c63 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Thu, 16 Nov 2017 17:39:18 +0000 Subject: [PATCH 0286/1013] rsi: fix memory leak on buf and usb_reg_buf [ Upstream commit d35ef8f846c72d84bfccf239c248c84f79c3a7e8 ] In the cases where len is too long, the error return path fails to kfree allocated buffers buf and usb_reg_buf. The simplest fix is to perform the sanity check on len before the allocations to avoid having to do the kfree'ing in the first place. Detected by CoverityScan, CID#1452258,1452259 ("Resource Leak") Fixes: 59f73e2ae185 ("rsi: check length before USB read/write register") Signed-off-by: Colin Ian King Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/rsi/rsi_91x_usb.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/wireless/rsi/rsi_91x_usb.c b/drivers/net/wireless/rsi/rsi_91x_usb.c index 81df09dd2636..f90c10b3c921 100644 --- a/drivers/net/wireless/rsi/rsi_91x_usb.c +++ b/drivers/net/wireless/rsi/rsi_91x_usb.c @@ -162,13 +162,13 @@ static int rsi_usb_reg_read(struct usb_device *usbdev, u8 *buf; int status = -ENOMEM; + if (len > RSI_USB_CTRL_BUF_SIZE) + return -EINVAL; + buf = kmalloc(RSI_USB_CTRL_BUF_SIZE, GFP_KERNEL); if (!buf) return status; - if (len > RSI_USB_CTRL_BUF_SIZE) - return -EINVAL; - status = usb_control_msg(usbdev, usb_rcvctrlpipe(usbdev, 0), USB_VENDOR_REGISTER_READ, @@ -207,13 +207,13 @@ static int rsi_usb_reg_write(struct usb_device *usbdev, u8 *usb_reg_buf; int status = -ENOMEM; + if (len > RSI_USB_CTRL_BUF_SIZE) + return -EINVAL; + usb_reg_buf = kmalloc(RSI_USB_CTRL_BUF_SIZE, GFP_KERNEL); if (!usb_reg_buf) return status; - if (len > RSI_USB_CTRL_BUF_SIZE) - return -EINVAL; - usb_reg_buf[0] = (value & 0x00ff); usb_reg_buf[1] = (value & 0xff00) >> 8; usb_reg_buf[2] = 0x0; From 65c4767b0e715d9c3310f9d26a7e981caf15c428 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Fri, 17 Nov 2017 15:37:57 -0800 Subject: [PATCH 0287/1013] drivers/rapidio/devices/rio_mport_cdev.c: fix resource leak in error handling path in 'rio_dma_transfer()' [ Upstream commit b1402dcb5643b7a27d46a05edd7491d49ba0e248 ] If 'dma_map_sg()', we should branch to the existing error handling path to free some resources before returning. Link: http://lkml.kernel.org/r/61292a4f369229eee03394247385e955027283f8.1505687047.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET Reviewed-by: Logan Gunthorpe Cc: Matt Porter Cc: Alexandre Bounine Cc: Lorenzo Stoakes Cc: Jesper Nilsson Cc: Christian K_nig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/rapidio/devices/rio_mport_cdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/rapidio/devices/rio_mport_cdev.c b/drivers/rapidio/devices/rio_mport_cdev.c index 5beb0c361076..76afe1449cab 100644 --- a/drivers/rapidio/devices/rio_mport_cdev.c +++ b/drivers/rapidio/devices/rio_mport_cdev.c @@ -963,7 +963,8 @@ rio_dma_transfer(struct file *filp, u32 transfer_mode, req->sgt.sgl, req->sgt.nents, dir); if (nents == -EFAULT) { rmcd_error("Failed to map SG list"); - return -EFAULT; + ret = -EFAULT; + goto err_pg; } ret = do_dma_request(req, xfer, sync, nents); From 30c2f774e13547f0dbb68cf779911d4d8920f0bc Mon Sep 17 00:00:00 2001 From: Joe Lawrence Date: Fri, 17 Nov 2017 15:29:17 -0800 Subject: [PATCH 0288/1013] pipe: match pipe_max_size data type with procfs [ Upstream commit 98159d977f71c3b3dee898d1c34e56f520b094e7 ] Patch series "A few round_pipe_size() and pipe-max-size fixups", v3. While backporting Michael's "pipe: fix limit handling" patchset to a distro-kernel, Mikulas noticed that current upstream pipe limit handling contains a few problems: 1 - procfs signed wrap: echo'ing a large number into /proc/sys/fs/pipe-max-size and then cat'ing it back out shows a negative value. 2 - round_pipe_size() nr_pages overflow on 32bit: this would subsequently try roundup_pow_of_two(0), which is undefined. 3 - visible non-rounded pipe-max-size value: there is no mutual exclusion or protection between the time pipe_max_size is assigned a raw value from proc_dointvec_minmax() and when it is rounded. 4 - unsigned long -> unsigned int conversion makes for potential odd return errors from do_proc_douintvec_minmax_conv() and do_proc_dopipe_max_size_conv(). This version underwent the same testing as v1: https://marc.info/?l=linux-kernel&m=150643571406022&w=2 This patch (of 4): pipe_max_size is defined as an unsigned int: unsigned int pipe_max_size = 1048576; but its procfs/sysctl representation is an integer: static struct ctl_table fs_table[] = { ... { .procname = "pipe-max-size", .data = &pipe_max_size, .maxlen = sizeof(int), .mode = 0644, .proc_handler = &pipe_proc_fn, .extra1 = &pipe_min_size, }, ... that is signed: int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf, size_t *lenp, loff_t *ppos) { ... ret = proc_dointvec_minmax(table, write, buf, lenp, ppos) This leads to signed results via procfs for large values of pipe_max_size: % echo 2147483647 >/proc/sys/fs/pipe-max-size % cat /proc/sys/fs/pipe-max-size -2147483648 Use unsigned operations on this variable to avoid such negative values. Link: http://lkml.kernel.org/r/1507658689-11669-2-git-send-email-joe.lawrence@redhat.com Signed-off-by: Joe Lawrence Reported-by: Mikulas Patocka Reviewed-by: Mikulas Patocka Cc: Michael Kerrisk Cc: Randy Dunlap Cc: Al Viro Cc: Jens Axboe Cc: Josh Poimboeuf Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/pipe.c | 2 +- kernel/sysctl.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/pipe.c b/fs/pipe.c index 349c9d56d4b3..3909c55ed389 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1125,7 +1125,7 @@ int pipe_proc_fn(struct ctl_table *table, int write, void __user *buf, { int ret; - ret = proc_dointvec_minmax(table, write, buf, lenp, ppos); + ret = proc_douintvec_minmax(table, write, buf, lenp, ppos); if (ret < 0 || !write) return ret; diff --git a/kernel/sysctl.c b/kernel/sysctl.c index d9c31bc2eaea..56aca862c4f5 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1822,7 +1822,7 @@ static struct ctl_table fs_table[] = { { .procname = "pipe-max-size", .data = &pipe_max_size, - .maxlen = sizeof(int), + .maxlen = sizeof(pipe_max_size), .mode = 0644, .proc_handler = &pipe_proc_fn, .extra1 = &pipe_min_size, From 346008fe47ed099f1793f2ba6721c9bc5b67e3a3 Mon Sep 17 00:00:00 2001 From: Stephen Bates Date: Fri, 17 Nov 2017 15:28:16 -0800 Subject: [PATCH 0289/1013] lib/genalloc.c: make the avail variable an atomic_long_t [ Upstream commit 36a3d1dd4e16bcd0d2ddfb4a2ec7092f0ae0d931 ] If the amount of resources allocated to a gen_pool exceeds 2^32 then the avail atomic overflows and this causes problems when clients try and borrow resources from the pool. This is only expected to be an issue on 64 bit systems. Add the header to pull in atomic_long* operations. So that 32 bit systems continue to use atomic32_t but 64 bit systems can use atomic64_t. Link: http://lkml.kernel.org/r/1509033843-25667-1-git-send-email-sbates@raithlin.com Signed-off-by: Stephen Bates Reviewed-by: Logan Gunthorpe Reviewed-by: Mathieu Desnoyers Reviewed-by: Daniel Mentz Cc: Jonathan Corbet Cc: Andrew Morton Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/genalloc.h | 3 ++- lib/genalloc.c | 10 +++++----- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 6dfec4d638df..872f930f1b06 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -32,6 +32,7 @@ #include #include +#include struct device; struct device_node; @@ -71,7 +72,7 @@ struct gen_pool { */ struct gen_pool_chunk { struct list_head next_chunk; /* next chunk in pool */ - atomic_t avail; + atomic_long_t avail; phys_addr_t phys_addr; /* physical starting address of memory chunk */ unsigned long start_addr; /* start address of memory chunk */ unsigned long end_addr; /* end address of memory chunk (inclusive) */ diff --git a/lib/genalloc.c b/lib/genalloc.c index 144fe6b1a03e..ca06adc4f445 100644 --- a/lib/genalloc.c +++ b/lib/genalloc.c @@ -194,7 +194,7 @@ int gen_pool_add_virt(struct gen_pool *pool, unsigned long virt, phys_addr_t phy chunk->phys_addr = phys; chunk->start_addr = virt; chunk->end_addr = virt + size - 1; - atomic_set(&chunk->avail, size); + atomic_long_set(&chunk->avail, size); spin_lock(&pool->lock); list_add_rcu(&chunk->next_chunk, &pool->chunks); @@ -304,7 +304,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size, nbits = (size + (1UL << order) - 1) >> order; rcu_read_lock(); list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) { - if (size > atomic_read(&chunk->avail)) + if (size > atomic_long_read(&chunk->avail)) continue; start_bit = 0; @@ -324,7 +324,7 @@ unsigned long gen_pool_alloc_algo(struct gen_pool *pool, size_t size, addr = chunk->start_addr + ((unsigned long)start_bit << order); size = nbits << order; - atomic_sub(size, &chunk->avail); + atomic_long_sub(size, &chunk->avail); break; } rcu_read_unlock(); @@ -390,7 +390,7 @@ void gen_pool_free(struct gen_pool *pool, unsigned long addr, size_t size) remain = bitmap_clear_ll(chunk->bits, start_bit, nbits); BUG_ON(remain); size = nbits << order; - atomic_add(size, &chunk->avail); + atomic_long_add(size, &chunk->avail); rcu_read_unlock(); return; } @@ -464,7 +464,7 @@ size_t gen_pool_avail(struct gen_pool *pool) rcu_read_lock(); list_for_each_entry_rcu(chunk, &pool->chunks, next_chunk) - avail += atomic_read(&chunk->avail); + avail += atomic_long_read(&chunk->avail); rcu_read_unlock(); return avail; } From 8cb22e0793d41e466bbaa42c01a87afb869c8e9b Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Fri, 17 Nov 2017 15:27:35 -0800 Subject: [PATCH 0290/1013] dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0 [ Upstream commit 1f3c790bd5989fcfec9e53ad8fa09f5b740c958f ] line-range is supposed to treat "1-" as "1-endoffile", so handle the special case by setting last_lineno to UINT_MAX. Fixes this error: dynamic_debug:ddebug_parse_query: last-line:0 < 1st-line:1 dynamic_debug:ddebug_exec_query: query parse failed Link: http://lkml.kernel.org/r/10a6a101-e2be-209f-1f41-54637824788e@infradead.org Signed-off-by: Randy Dunlap Acked-by: Jason Baron Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- lib/dynamic_debug.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c index da796e2dc4f5..c7c96bc7654a 100644 --- a/lib/dynamic_debug.c +++ b/lib/dynamic_debug.c @@ -360,6 +360,10 @@ static int ddebug_parse_query(char *words[], int nwords, if (parse_lineno(last, &query->last_lineno) < 0) return -EINVAL; + /* special case for last lineno not specified */ + if (query->last_lineno == 0) + query->last_lineno = UINT_MAX; + if (query->last_lineno < query->first_lineno) { pr_err("last-line:%d < 1st-line:%d\n", query->last_lineno, From 57f94fd10554a2d73e92490ba0eb8e575b8b8fbb Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 6 Nov 2017 15:28:04 -0500 Subject: [PATCH 0291/1013] NFS: Fix a typo in nfs_rename() [ Upstream commit d803224c84be067754db7fa58a93f36f61566493 ] On successful rename, the "old_dentry" is retained and is attached to the "new_dir", so we need to call nfs_set_verifier() accordingly. Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/nfs/dir.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index b03b3bc05f96..bf2c43635062 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -2064,7 +2064,7 @@ int nfs_rename(struct inode *old_dir, struct dentry *old_dentry, * should mark the directories for revalidation. */ d_move(old_dentry, new_dentry); - nfs_set_verifier(new_dentry, + nfs_set_verifier(old_dentry, nfs_save_change_attribute(new_dir)); } else if (error == -ENOENT) nfs_dentry_handle_enoent(old_dentry); From 94d6b7fa7226dba65a3ea718db6499a87b7623c3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 3 Nov 2017 13:46:06 -0400 Subject: [PATCH 0292/1013] sunrpc: Fix rpc_task_begin trace point [ Upstream commit b2bfe5915d5fe7577221031a39ac722a0a2a1199 ] The rpc_task_begin trace point always display a task ID of zero. Move the trace point call site so that it picks up the new task ID. Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/sunrpc/sched.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 0cc83839c13c..f9db5fe52d36 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -274,10 +274,9 @@ static inline void rpc_task_set_debuginfo(struct rpc_task *task) static void rpc_set_active(struct rpc_task *task) { - trace_rpc_task_begin(task->tk_client, task, NULL); - rpc_task_set_debuginfo(task); set_bit(RPC_TASK_ACTIVE, &task->tk_runstate); + trace_rpc_task_begin(task->tk_client, task, NULL); } /* From cfcbc4f35a73c95d8843495274cd9fff1d5f4c33 Mon Sep 17 00:00:00 2001 From: Dirk van der Merwe Date: Thu, 16 Nov 2017 17:06:41 -0800 Subject: [PATCH 0293/1013] nfp: inherit the max_mtu from the PF netdev [ Upstream commit 743ba5b47f7961fb29f2e06bb694fb4f068ac58f ] The PF netdev is used for data transfer for reprs, so reprs inherit the maximum MTU settings of the PF netdev. Fixes: 5de73ee46704 ("nfp: general representor implementation") Signed-off-by: Dirk van der Merwe Reviewed-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/netronome/nfp/nfp_net_repr.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c index d540a9dc77b3..1c43aca8162d 100644 --- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.c @@ -297,6 +297,8 @@ int nfp_repr_init(struct nfp_app *app, struct net_device *netdev, netdev->netdev_ops = &nfp_repr_netdev_ops; netdev->ethtool_ops = &nfp_port_ethtool_ops; + netdev->max_mtu = pf_netdev->max_mtu; + SWITCHDEV_SET_OPS(netdev, &nfp_port_switchdev_ops); if (nfp_app_has_tc(app)) { From 1cb98be5f989fc2820d7198170098939c38579a3 Mon Sep 17 00:00:00 2001 From: Pieter Jansen van Vuuren Date: Thu, 16 Nov 2017 17:06:39 -0800 Subject: [PATCH 0294/1013] nfp: fix flower offload metadata flag usage [ Upstream commit 6c3ab204f4ca00374a374bc0fc9a275b64d1bcbb ] Hardware has no notion of new or last mask id, instead it makes use of the message type (i.e. add flow or del flow) in combination with a single bit in metadata flags to determine when to add or delete a mask id. Previously we made use of the new or last flags to indicate that a new mask should be allocated or deallocated, respectively. This incorrect behaviour is fixed by making use single bit in metadata flags to indicate mask allocation or deallocation. Fixes: 43f84b72c50d ("nfp: add metadata to each flow offload") Signed-off-by: Pieter Jansen van Vuuren Reviewed-by: Jakub Kicinski Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/netronome/nfp/flower/main.h | 3 +-- drivers/net/ethernet/netronome/nfp/flower/metadata.c | 7 +++++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/flower/main.h b/drivers/net/ethernet/netronome/nfp/flower/main.h index c20dd00a1cae..899e7d53e669 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/main.h +++ b/drivers/net/ethernet/netronome/nfp/flower/main.h @@ -52,8 +52,7 @@ struct nfp_app; #define NFP_FLOWER_MASK_ELEMENT_RS 1 #define NFP_FLOWER_MASK_HASH_BITS 10 -#define NFP_FL_META_FLAG_NEW_MASK 128 -#define NFP_FL_META_FLAG_LAST_MASK 1 +#define NFP_FL_META_FLAG_MANAGE_MASK BIT(7) #define NFP_FL_MASK_REUSE_TIME_NS 40000 #define NFP_FL_MASK_ID_LOCATION 1 diff --git a/drivers/net/ethernet/netronome/nfp/flower/metadata.c b/drivers/net/ethernet/netronome/nfp/flower/metadata.c index 3226ddc55f99..d9582ccc0025 100644 --- a/drivers/net/ethernet/netronome/nfp/flower/metadata.c +++ b/drivers/net/ethernet/netronome/nfp/flower/metadata.c @@ -282,7 +282,7 @@ nfp_check_mask_add(struct nfp_app *app, char *mask_data, u32 mask_len, id = nfp_add_mask_table(app, mask_data, mask_len); if (id < 0) return false; - *meta_flags |= NFP_FL_META_FLAG_NEW_MASK; + *meta_flags |= NFP_FL_META_FLAG_MANAGE_MASK; } *mask_id = id; @@ -299,6 +299,9 @@ nfp_check_mask_remove(struct nfp_app *app, char *mask_data, u32 mask_len, if (!mask_entry) return false; + if (meta_flags) + *meta_flags &= ~NFP_FL_META_FLAG_MANAGE_MASK; + *mask_id = mask_entry->mask_id; mask_entry->ref_cnt--; if (!mask_entry->ref_cnt) { @@ -306,7 +309,7 @@ nfp_check_mask_remove(struct nfp_app *app, char *mask_data, u32 mask_len, nfp_release_mask_id(app, *mask_id); kfree(mask_entry); if (meta_flags) - *meta_flags |= NFP_FL_META_FLAG_LAST_MASK; + *meta_flags |= NFP_FL_META_FLAG_MANAGE_MASK; } return true; From f4da9e07a6a27939f9e976ade9945c735ad026b9 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Tue, 14 Nov 2017 16:34:44 -0800 Subject: [PATCH 0295/1013] xfs: fix forgotten rcu read unlock when skipping inode reclaim [ Upstream commit 962cc1ad6caddb5abbb9f0a43e5abe7131a71f18 ] In commit f2e9ad21 ("xfs: check for race with xfs_reclaim_inode"), we skip an inode if we're racing with freeing the inode via xfs_reclaim_inode, but we forgot to release the rcu read lock when dumping the inode, with the result that we exit to userspace with a lock held. Don't do that; generic/320 with a 1k block size fails this very occasionally. ================================================ WARNING: lock held when returning to user space! 4.14.0-rc6-djwong #4 Tainted: G W ------------------------------------------------ rm/30466 is leaving the kernel with locks still held! 1 lock held by rm/30466: #0: (rcu_read_lock){....}, at: [] xfs_ifree_cluster.isra.17+0x2c3/0x6f0 [xfs] ------------[ cut here ]------------ WARNING: CPU: 1 PID: 30466 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x71/0x700 Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug] CPU: 1 PID: 30466 Comm: rm Tainted: G W 4.14.0-rc6-djwong #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014 task: ffff880037680000 task.stack: ffffc90001064000 RIP: 0010:rcu_note_context_switch+0x71/0x700 RSP: 0000:ffffc90001067e50 EFLAGS: 00010002 RAX: 0000000000000001 RBX: ffff880037680000 RCX: ffff88003e73d200 RDX: 0000000000000002 RSI: ffffffff819e53e9 RDI: ffffffff819f4375 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880062c900d0 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037680000 R13: 0000000000000000 R14: ffffc90001067eb8 R15: ffff880037680690 FS: 00007fa3b8ce8700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f69bf77c000 CR3: 000000002450a000 CR4: 00000000000006e0 Call Trace: __schedule+0xb8/0xb10 schedule+0x40/0x90 exit_to_usermode_loop+0x6b/0xa0 prepare_exit_to_usermode+0x7a/0x90 retint_user+0x8/0x20 RIP: 0033:0x7fa3b87fda87 RSP: 002b:00007ffe41206568 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02 RAX: 0000000000000000 RBX: 00000000010e88c0 RCX: 00007fa3b87fda87 RDX: 0000000000000000 RSI: 00000000010e89c8 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000 R10: 000000000000015e R11: 0000000000000246 R12: 00000000010c8060 R13: 00007ffe41206690 R14: 0000000000000000 R15: 0000000000000000 ---[ end trace e88f83bf0cfbd07d ]--- Fixes: f2e9ad212def50bcf4c098c6288779dd97fff0f0 Cc: Omar Sandoval Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Reviewed-by: Omar Sandoval Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/xfs/xfs_inode.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 4ec5b7f45401..63350906961a 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2378,6 +2378,7 @@ xfs_ifree_cluster( */ if (ip->i_ino != inum + i) { xfs_iunlock(ip, XFS_ILOCK_EXCL); + rcu_read_unlock(); continue; } } From 84e0b87ebfb56041d8130bf220d1494515332b65 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Thu, 9 Nov 2017 18:07:17 +0100 Subject: [PATCH 0296/1013] dt-bindings: usb: fix reg-property port-number range [ Upstream commit f42ae7b0540937e00fe005812997f126aaac4bc2 ] The USB hub port-number range for USB 2.0 is 1-255 and not 1-31 which reflects an arbitrary limit set by the current Linux implementation. Note that for USB 3.1 hubs the valid range is 1-15. Increase the documented valid range in the binding to 255, which is the maximum allowed by the specifications. Signed-off-by: Johan Hovold Signed-off-by: Rob Herring Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- Documentation/devicetree/bindings/usb/usb-device.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/devicetree/bindings/usb/usb-device.txt b/Documentation/devicetree/bindings/usb/usb-device.txt index ce02cebac26a..464ddf7b509a 100644 --- a/Documentation/devicetree/bindings/usb/usb-device.txt +++ b/Documentation/devicetree/bindings/usb/usb-device.txt @@ -11,7 +11,7 @@ Required properties: be used, but a device adhering to this binding may leave out all except for usbVID,PID. - reg: the port number which this device is connecting to, the range - is 1-31. + is 1-255. Example: From 60bed713ab191588edeba77f2cfa8cc13ccb7000 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Thu, 16 Nov 2017 08:08:44 +0800 Subject: [PATCH 0297/1013] block: wake up all tasks blocked in get_request() [ Upstream commit 34d9715ac1edd50285168dd8d80c972739a4f6a4 ] Once blk_set_queue_dying() is done in blk_cleanup_queue(), we call blk_freeze_queue() and wait for q->q_usage_counter becoming zero. But if there are tasks blocked in get_request(), q->q_usage_counter can never become zero. So we have to wake up all these tasks in blk_set_queue_dying() first. Fixes: 3ef28e83ab157997 ("block: generic request_queue reference counting") Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- block/blk-core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index 33ee583cfe45..516ce3174683 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -605,8 +605,8 @@ void blk_set_queue_dying(struct request_queue *q) spin_lock_irq(q->queue_lock); blk_queue_for_each_rl(rl, q) { if (rl->rq_pool) { - wake_up(&rl->wait[BLK_RW_SYNC]); - wake_up(&rl->wait[BLK_RW_ASYNC]); + wake_up_all(&rl->wait[BLK_RW_SYNC]); + wake_up_all(&rl->wait[BLK_RW_ASYNC]); } } spin_unlock_irq(q->queue_lock); From 99dac8af6e6fe6100aaaba0a69e44fcb886c2b92 Mon Sep 17 00:00:00 2001 From: Pavel Tatashin Date: Wed, 15 Nov 2017 17:36:18 -0800 Subject: [PATCH 0298/1013] sparc64/mm: set fields in deferred pages [ Upstream commit 2a20aa171071a334d80c4e5d5af719d8374702fc ] Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT), flags and other fields in "struct page"es are never changed prior to first initializing struct pages by going through __init_single_page(). With deferred struct page feature enabled there is a case where we set some fields prior to initializing: mem_init() { register_page_bootmem_info(); free_all_bootmem(); ... } When register_page_bootmem_info() is called only non-deferred struct pages are initialized. But, this function goes through some reserved pages which might be part of the deferred, and thus are not yet initialized. mem_init register_page_bootmem_info register_page_bootmem_info_node get_page_bootmem .. setting fields here .. such as: page->freelist = (void *)type; free_all_bootmem() free_low_memory_core_early() for_each_reserved_mem_region() reserve_bootmem_region() init_reserved_page() <- Only if this is deferred reserved page __init_single_pfn() __init_single_page() memset(0) <-- Loose the set fields here We end up with similar issue as in the previous patch, where currently we do not observe problem as memory is zeroed. But, if flag asserts are changed we can start hitting issues. Also, because in this patch series we will stop zeroing struct page memory during allocation, we must make sure that struct pages are properly initialized prior to using them. The deferred-reserved pages are initialized in free_all_bootmem(). Therefore, the fix is to switch the above calls. Link: http://lkml.kernel.org/r/20171013173214.27300-4-pasha.tatashin@oracle.com Signed-off-by: Pavel Tatashin Reviewed-by: Steven Sistare Reviewed-by: Daniel Jordan Reviewed-by: Bob Picco Acked-by: David S. Miller Acked-by: Michal Hocko Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Christian Borntraeger Cc: Dmitry Vyukov Cc: Heiko Carstens Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Mark Rutland Cc: Matthew Wilcox Cc: Mel Gorman Cc: Michal Hocko Cc: Sam Ravnborg Cc: Thomas Gleixner Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/sparc/mm/init_64.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c index 61bdc1270d19..a0cc1be767c8 100644 --- a/arch/sparc/mm/init_64.c +++ b/arch/sparc/mm/init_64.c @@ -2540,9 +2540,16 @@ void __init mem_init(void) { high_memory = __va(last_valid_pfn << PAGE_SHIFT); - register_page_bootmem_info(); free_all_bootmem(); + /* + * Must be done after boot memory is put on freelist, because here we + * might set fields in deferred struct pages that have not yet been + * initialized, and free_all_bootmem() initializes all the reserved + * deferred pages for us. + */ + register_page_bootmem_info(); + /* * Set up the zero page, mark it reserved, so that page count * is not manipulated when freeing the page from user ptes. From 1238334082ea47738c79c2acb4546767593e1a75 Mon Sep 17 00:00:00 2001 From: Sergey Senozhatsky Date: Wed, 15 Nov 2017 17:34:03 -0800 Subject: [PATCH 0299/1013] zsmalloc: calling zs_map_object() from irq is a bug [ Upstream commit 1aedcafbf32b3f232c159b14cd0d423fcfe2b861 ] Use BUG_ON(in_interrupt()) in zs_map_object(). This is not a new BUG_ON(), it's always been there, but was recently changed to VM_BUG_ON(). There are several problems there. First, we use use per-CPU mappings both in zsmalloc and in zram, and interrupt may easily corrupt those buffers. Second, and more importantly, we believe it's possible to start leaking sensitive information. Consider the following case: -> process P swap out zram per-cpu mapping CPU1 compress page A -> IRQ swap out zram per-cpu mapping CPU1 compress page B write page from per-cpu mapping CPU1 to zsmalloc pool iret -> process P write page from per-cpu mapping CPU1 to zsmalloc pool [*] return * so we store overwritten data that actually belongs to another page (task) and potentially contains sensitive data. And when process P will page fault it's going to read (swap in) that other task's data. Link: http://lkml.kernel.org/r/20170929045140.4055-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky Acked-by: Minchan Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/zsmalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c index 7c38e850a8fc..685049a9048d 100644 --- a/mm/zsmalloc.c +++ b/mm/zsmalloc.c @@ -1349,7 +1349,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long handle, * pools/users, we can't allow mapping in interrupt context * because it can corrupt another users mappings. */ - WARN_ON_ONCE(in_interrupt()); + BUG_ON(in_interrupt()); /* From now on, migration cannot move the object */ pin_tag(handle); From 5ca94e03675a961424d47ee8d21c4b84606b742d Mon Sep 17 00:00:00 2001 From: Miles Chen Date: Wed, 15 Nov 2017 17:32:25 -0800 Subject: [PATCH 0300/1013] slub: fix sysfs duplicate filename creation when slub_debug=O [ Upstream commit 11066386efa692f77171484c32ea30f6e5a0d729 ] When slub_debug=O is set. It is possible to clear debug flags for an "unmergeable" slab cache in kmem_cache_open(). It makes the "unmergeable" cache became "mergeable" in sysfs_slab_add(). These caches will generate their "unique IDs" by create_unique_id(), but it is possible to create identical unique IDs. In my experiment, sgpool-128, names_cache, biovec-256 generate the same ID ":Ft-0004096" and the kernel reports "sysfs: cannot create duplicate filename '/kernel/slab/:Ft-0004096'". To repeat my experiment, set disable_higher_order_debug=1, CONFIG_SLUB_DEBUG_ON=y in kernel-4.14. Fix this issue by setting unmergeable=1 if slub_debug=O and the the default slub_debug contains any no-merge flags. call path: kmem_cache_create() __kmem_cache_alias() -> we set SLAB_NEVER_MERGE flags here create_cache() __kmem_cache_create() kmem_cache_open() -> clear DEBUG_METADATA_FLAGS sysfs_slab_add() -> the slab cache is mergeable now sysfs: cannot create duplicate filename '/kernel/slab/:Ft-0004096' ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.14.0-rc7ajb-00131-gd4c2e9f-dirty #123 Hardware name: linux,dummy-virt (DT) task: ffffffc07d4e0080 task.stack: ffffff8008008000 PC is at sysfs_warn_dup+0x60/0x7c LR is at sysfs_warn_dup+0x60/0x7c pc : lr : pstate: 60000145 Call trace: sysfs_warn_dup+0x60/0x7c sysfs_create_dir_ns+0x98/0xa0 kobject_add_internal+0xa0/0x294 kobject_init_and_add+0x90/0xb4 sysfs_slab_add+0x90/0x200 __kmem_cache_create+0x26c/0x438 kmem_cache_create+0x164/0x1f4 sg_pool_init+0x60/0x100 do_one_initcall+0x38/0x12c kernel_init_freeable+0x138/0x1d4 kernel_init+0x10/0xfc ret_from_fork+0x10/0x18 Link: http://lkml.kernel.org/r/1510365805-5155-1-git-send-email-miles.chen@mediatek.com Signed-off-by: Miles Chen Acked-by: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- mm/slub.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/mm/slub.c b/mm/slub.c index 1efbb8123037..8e1c027a30f4 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -5704,6 +5704,10 @@ static int sysfs_slab_add(struct kmem_cache *s) return 0; } + if (!unmergeable && disable_higher_order_debug && + (slub_debug & DEBUG_METADATA_FLAGS)) + unmergeable = 1; + if (unmergeable) { /* * Slabcache can never be merged so we can use the name proper. From 191d96120f955f23474176da8272a26fd6d0b9bd Mon Sep 17 00:00:00 2001 From: Xin Long Date: Wed, 15 Nov 2017 16:55:54 +0800 Subject: [PATCH 0301/1013] sctp: do not free asoc when it is already dead in sctp_sendmsg [ Upstream commit ca3af4dd28cff4e7216e213ba3b671fbf9f84758 ] Now in sctp_sendmsg sctp_wait_for_sndbuf could schedule out without holding sock sk. It means the current asoc can be freed elsewhere, like when receiving an abort packet. If the asoc is just created in sctp_sendmsg and sctp_wait_for_sndbuf returns err, the asoc will be freed again due to new_asoc is not nil. An use-after-free issue would be triggered by this. This patch is to fix it by setting new_asoc with nil if the asoc is already dead when cpu schedules back, so that it will not be freed again in sctp_sendmsg. v1->v2: set new_asoc as nil in sctp_sendmsg instead of sctp_wait_for_sndbuf. Suggested-by: Neil Horman Reported-by: Dmitry Vyukov Signed-off-by: Xin Long Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/sctp/socket.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 6f45d1713452..e62251fc6f49 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -1963,8 +1963,14 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len) timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); if (!sctp_wspace(asoc)) { err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len); - if (err) + if (err) { + if (err == -ESRCH) { + /* asoc is already dead. */ + new_asoc = NULL; + err = -EPIPE; + } goto out_free; + } } /* If an address is passed with the sendto/sendmsg call, it is used @@ -7839,10 +7845,11 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, for (;;) { prepare_to_wait_exclusive(&asoc->wait, &wait, TASK_INTERRUPTIBLE); + if (asoc->base.dead) + goto do_dead; if (!*timeo_p) goto do_nonblock; - if (sk->sk_err || asoc->state >= SCTP_STATE_SHUTDOWN_PENDING || - asoc->base.dead) + if (sk->sk_err || asoc->state >= SCTP_STATE_SHUTDOWN_PENDING) goto do_error; if (signal_pending(current)) goto do_interrupted; @@ -7867,6 +7874,10 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, return err; +do_dead: + err = -ESRCH; + goto out; + do_error: err = -EPIPE; goto out; From 0c7e787bfc91cb374b60ba5fe7055178c94a7cb7 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Wed, 15 Nov 2017 16:57:26 +0800 Subject: [PATCH 0302/1013] sctp: use the right sk after waking up from wait_buf sleep [ Upstream commit cea0cc80a6777beb6eb643d4ad53690e1ad1d4ff ] Commit dfcb9f4f99f1 ("sctp: deny peeloff operation on asocs with threads sleeping on it") fixed the race between peeloff and wait sndbuf by checking waitqueue_active(&asoc->wait) in sctp_do_peeloff(). But it actually doesn't work, as even if waitqueue_active returns false the waiting sndbuf thread may still not yet hold sk lock. After asoc is peeled off, sk is not asoc->base.sk any more, then to hold the old sk lock couldn't make assoc safe to access. This patch is to fix this by changing to hold the new sk lock if sk is not asoc->base.sk, meanwhile, also set the sk in sctp_sendmsg with the new sk. With this fix, there is no more race between peeloff and waitbuf, the check 'waitqueue_active' in sctp_do_peeloff can be removed. Thanks Marcelo and Neil for making this clear. v1->v2: fix it by changing to lock the new sock instead of adding a flag in asoc. Suggested-by: Neil Horman Signed-off-by: Xin Long Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/sctp/socket.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index e62251fc6f49..14c28fbfe6b8 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -83,8 +83,8 @@ /* Forward declarations for internal helper functions. */ static int sctp_writeable(struct sock *sk); static void sctp_wfree(struct sk_buff *skb); -static int sctp_wait_for_sndbuf(struct sctp_association *, long *timeo_p, - size_t msg_len); +static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, + size_t msg_len, struct sock **orig_sk); static int sctp_wait_for_packet(struct sock *sk, int *err, long *timeo_p); static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p); static int sctp_wait_for_accept(struct sock *sk, long timeo); @@ -1962,7 +1962,8 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len) timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); if (!sctp_wspace(asoc)) { - err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len); + /* sk can be changed by peel off when waiting for buf. */ + err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len, &sk); if (err) { if (err == -ESRCH) { /* asoc is already dead. */ @@ -4949,12 +4950,6 @@ int sctp_do_peeloff(struct sock *sk, sctp_assoc_t id, struct socket **sockp) if (!asoc) return -EINVAL; - /* If there is a thread waiting on more sndbuf space for - * sending on this asoc, it cannot be peeled. - */ - if (waitqueue_active(&asoc->wait)) - return -EBUSY; - /* An association cannot be branched off from an already peeled-off * socket, nor is this supported for tcp style sockets. */ @@ -7828,7 +7823,7 @@ void sctp_sock_rfree(struct sk_buff *skb) /* Helper function to wait for space in the sndbuf. */ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, - size_t msg_len) + size_t msg_len, struct sock **orig_sk) { struct sock *sk = asoc->base.sk; int err = 0; @@ -7862,11 +7857,17 @@ static int sctp_wait_for_sndbuf(struct sctp_association *asoc, long *timeo_p, release_sock(sk); current_timeo = schedule_timeout(current_timeo); lock_sock(sk); + if (sk != asoc->base.sk) { + release_sock(sk); + sk = asoc->base.sk; + lock_sock(sk); + } *timeo_p = current_timeo; } out: + *orig_sk = sk; finish_wait(&asoc->wait, &wait); /* Release the association's refcnt. */ From 6e9c2a05c368a0b53d5b6647798e67830f5508e0 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 14 Nov 2017 14:43:56 -0500 Subject: [PATCH 0303/1013] fcntl: don't leak fd reference when fixup_compat_flock fails [ Upstream commit 9280a601e6080c9ff658468c1c775ff6514099a6 ] Currently we just return err here, but we need to put the fd reference first. Fixes: 94073ad77fff (fs/locks: don't mess with the address limit in compat_fcntl64) Signed-off-by: Jeff Layton Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/fcntl.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/fcntl.c b/fs/fcntl.c index 8d78ffd7b399..6fd311367efc 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -632,9 +632,8 @@ COMPAT_SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, if (err) break; err = fixup_compat_flock(&flock); - if (err) - return err; - err = put_compat_flock(&flock, compat_ptr(arg)); + if (!err) + err = put_compat_flock(&flock, compat_ptr(arg)); break; case F_GETLK64: case F_OFD_GETLK: From 81a1c2d3f9eb5996714d47dc0fcab91cee0678d4 Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Wed, 15 Nov 2017 09:43:09 +0800 Subject: [PATCH 0304/1013] geneve: fix fill_info when link down [ Upstream commit fd7eafd02121d6ef501ef1a4a891e6061366c952 ] geneve->sock4/6 were added with geneve_open and released with geneve_stop. So when geneve link down, we will not able to show remote address and checksum info after commit 11387fe4a98 ("geneve: fix fill_info when using collect_metadata"). Fix this by avoid passing *_REMOTE{,6} for COLLECT_METADATA since they are mutually exclusive, and always show UDP_ZERO_CSUM6_RX info. Fixes: 11387fe4a98 ("geneve: fix fill_info when using collect_metadata") Signed-off-by: Hangbin Liu Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/geneve.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index ed51018a813e..b9d8d71a6ecc 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -1503,6 +1503,7 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev) { struct geneve_dev *geneve = netdev_priv(dev); struct ip_tunnel_info *info = &geneve->info; + bool metadata = geneve->collect_md; __u8 tmp_vni[3]; __u32 vni; @@ -1511,32 +1512,24 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev) if (nla_put_u32(skb, IFLA_GENEVE_ID, vni)) goto nla_put_failure; - if (rtnl_dereference(geneve->sock4)) { + if (!metadata && ip_tunnel_info_af(info) == AF_INET) { if (nla_put_in_addr(skb, IFLA_GENEVE_REMOTE, info->key.u.ipv4.dst)) goto nla_put_failure; - if (nla_put_u8(skb, IFLA_GENEVE_UDP_CSUM, !!(info->key.tun_flags & TUNNEL_CSUM))) goto nla_put_failure; - } - #if IS_ENABLED(CONFIG_IPV6) - if (rtnl_dereference(geneve->sock6)) { + } else if (!metadata) { if (nla_put_in6_addr(skb, IFLA_GENEVE_REMOTE6, &info->key.u.ipv6.dst)) goto nla_put_failure; - if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_TX, !(info->key.tun_flags & TUNNEL_CSUM))) goto nla_put_failure; - - if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, - !geneve->use_udp6_rx_checksums)) - goto nla_put_failure; - } #endif + } if (nla_put_u8(skb, IFLA_GENEVE_TTL, info->key.ttl) || nla_put_u8(skb, IFLA_GENEVE_TOS, info->key.tos) || @@ -1546,10 +1539,13 @@ static int geneve_fill_info(struct sk_buff *skb, const struct net_device *dev) if (nla_put_be16(skb, IFLA_GENEVE_PORT, info->key.tp_dst)) goto nla_put_failure; - if (geneve->collect_md) { - if (nla_put_flag(skb, IFLA_GENEVE_COLLECT_METADATA)) + if (metadata && nla_put_flag(skb, IFLA_GENEVE_COLLECT_METADATA)) goto nla_put_failure; - } + + if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX, + !geneve->use_udp6_rx_checksums)) + goto nla_put_failure; + return 0; nla_put_failure: From b316280c81337971e5cbf239400186425de7bae9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 14 Nov 2017 17:15:50 -0800 Subject: [PATCH 0305/1013] bpf: fix lockdep splat [ Upstream commit 89ad2fa3f043a1e8daae193bcb5fe34d5f8caf28 ] pcpu_freelist_pop() needs the same lockdep awareness than pcpu_freelist_populate() to avoid a false positive. [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire: (&htab->buckets[i].lock){......}, at: [] __htab_percpu_map_update_elem+0x1cb/0x300 and this task is already holding: (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [] __dev_queue_xmit+0 x868/0x1240 which would create a new lock dependency: (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......} but this new dependency connects a SOFTIRQ-irq-safe lock: (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} ... which became SOFTIRQ-irq-safe at: [] __lock_acquire+0x42b/0x1f10 [] lock_acquire+0xbc/0x1b0 [] _raw_spin_lock+0x38/0x50 [] __dev_queue_xmit+0x868/0x1240 [] dev_queue_xmit+0x10/0x20 [] ip_finish_output2+0x439/0x590 [] ip_finish_output+0x150/0x2f0 [] ip_output+0x7d/0x260 [] ip_local_out+0x5e/0xe0 [] ip_queue_xmit+0x205/0x620 [] tcp_transmit_skb+0x5a8/0xcb0 [] tcp_write_xmit+0x242/0x1070 [] __tcp_push_pending_frames+0x3c/0xf0 [] tcp_rcv_established+0x312/0x700 [] tcp_v4_do_rcv+0x11c/0x200 [] tcp_v4_rcv+0xaa2/0xc30 [] ip_local_deliver_finish+0xa7/0x240 [] ip_local_deliver+0x66/0x200 [] ip_rcv_finish+0xdd/0x560 [] ip_rcv+0x295/0x510 [] __netif_receive_skb_core+0x988/0x1020 [] __netif_receive_skb+0x21/0x70 [] process_backlog+0x6f/0x230 [] net_rx_action+0x229/0x420 [] __do_softirq+0xd8/0x43d [] do_softirq_own_stack+0x1c/0x30 [] do_softirq+0x55/0x60 [] __local_bh_enable_ip+0xa8/0xb0 [] cpu_startup_entry+0x1c7/0x500 [] start_secondary+0x113/0x140 to a SOFTIRQ-irq-unsafe lock: (&head->lock){+.+...} ... which became SOFTIRQ-irq-unsafe at: ... [] __lock_acquire+0x82f/0x1f10 [] lock_acquire+0xbc/0x1b0 [] _raw_spin_lock+0x38/0x50 [] pcpu_freelist_pop+0x7a/0xb0 [] htab_map_alloc+0x50c/0x5f0 [] SyS_bpf+0x265/0x1200 [] entry_SYSCALL_64_fastpath+0x12/0x17 other info that might help us debug this: Chain exists of: dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&head->lock); local_irq_disable(); lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2); lock(&htab->buckets[i].lock); lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2); *** DEADLOCK *** Fixes: e19494edab82 ("bpf: introduce percpu_freelist") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/percpu_freelist.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/kernel/bpf/percpu_freelist.c b/kernel/bpf/percpu_freelist.c index 5c51d1985b51..673fa6fe2d73 100644 --- a/kernel/bpf/percpu_freelist.c +++ b/kernel/bpf/percpu_freelist.c @@ -78,8 +78,10 @@ struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s) { struct pcpu_freelist_head *head; struct pcpu_freelist_node *node; + unsigned long flags; int orig_cpu, cpu; + local_irq_save(flags); orig_cpu = cpu = raw_smp_processor_id(); while (1) { head = per_cpu_ptr(s->freelist, cpu); @@ -87,14 +89,16 @@ struct pcpu_freelist_node *pcpu_freelist_pop(struct pcpu_freelist *s) node = head->first; if (node) { head->first = node->next; - raw_spin_unlock(&head->lock); + raw_spin_unlock_irqrestore(&head->lock, flags); return node; } raw_spin_unlock(&head->lock); cpu = cpumask_next(cpu, cpu_possible_mask); if (cpu >= nr_cpu_ids) cpu = 0; - if (cpu == orig_cpu) + if (cpu == orig_cpu) { + local_irq_restore(flags); return NULL; + } } } From 8474be654b2723426f37ad95a48bff1acdb022f7 Mon Sep 17 00:00:00 2001 From: Gabriel Fernandez Date: Wed, 11 Oct 2017 08:57:24 +0200 Subject: [PATCH 0306/1013] clk: stm32h7: fix test of clock config [ Upstream commit c1ea839c41d049604a3f64ef72712d1c7c6639d0 ] fix test of composite clock config (bad copy / past) Signed-off-by: Gabriel Fernandez Fixes: 3e4d618b0722 ("clk: stm32h7: Add stm32h743 clock driver") Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/clk-stm32h7.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/clk/clk-stm32h7.c b/drivers/clk/clk-stm32h7.c index a94c3f56c590..61c3e40507d3 100644 --- a/drivers/clk/clk-stm32h7.c +++ b/drivers/clk/clk-stm32h7.c @@ -384,7 +384,7 @@ static void get_cfg_composite_div(const struct composite_clk_gcfg *gcfg, mux_ops = div_ops = gate_ops = NULL; mux_hw = div_hw = gate_hw = NULL; - if (gcfg->mux && gcfg->mux) { + if (gcfg->mux && cfg->mux) { mux = _get_cmux(base + cfg->mux->offset, cfg->mux->shift, cfg->mux->width, @@ -410,7 +410,7 @@ static void get_cfg_composite_div(const struct composite_clk_gcfg *gcfg, } } - if (gcfg->gate && gcfg->gate) { + if (gcfg->gate && cfg->gate) { gate = _get_cgate(base + cfg->gate->offset, cfg->gate->bit_idx, gcfg->gate->flags, lock); From de1090298880468eb6e29e5b1e046862f2301950 Mon Sep 17 00:00:00 2001 From: Mylene JOSSERAND Date: Sun, 5 Nov 2017 17:51:34 +0100 Subject: [PATCH 0307/1013] clk: sunxi-ng: a83t: Fix i2c buses bits MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit cc54c0955d6f8618a38a999eecdc3d95306b90de ] i2c1 and i2c2 bits for CCU are not bit 0 but bit 1 and bit 2. Because of that, the i2c0 (bit 0) was not correctly configured. Fixed the correct bits for i2c1 and i2c2. Fixes: 05359be1176b ("clk: sunxi-ng: Add driver for A83T CCU") Signed-off-by: Mylène Josserand Acked-by: Maxime Ripard Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/sunxi-ng/ccu-sun8i-a83t.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c b/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c index e43acebdfbcd..f8203115a6bc 100644 --- a/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c +++ b/drivers/clk/sunxi-ng/ccu-sun8i-a83t.c @@ -354,9 +354,9 @@ static SUNXI_CCU_GATE(bus_tdm_clk, "bus-tdm", "apb1", static SUNXI_CCU_GATE(bus_i2c0_clk, "bus-i2c0", "apb2", 0x06c, BIT(0), 0); static SUNXI_CCU_GATE(bus_i2c1_clk, "bus-i2c1", "apb2", - 0x06c, BIT(0), 0); + 0x06c, BIT(1), 0); static SUNXI_CCU_GATE(bus_i2c2_clk, "bus-i2c2", "apb2", - 0x06c, BIT(0), 0); + 0x06c, BIT(2), 0); static SUNXI_CCU_GATE(bus_uart0_clk, "bus-uart0", "apb2", 0x06c, BIT(16), 0); static SUNXI_CCU_GATE(bus_uart1_clk, "bus-uart1", "apb2", From cd11ce209d73cba69a5776818795446a7991755e Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Sat, 11 Nov 2017 17:29:28 +0100 Subject: [PATCH 0308/1013] clk: qcom: common: fix legacy board-clock registration [ Upstream commit 43a51019cc8ff1b1cd2ba72e86563beb40d356fc ] Make sure to search only the child nodes of "/clocks", rather than the whole device-tree depth-first starting at "/clocks" when determining whether to register a fixed clock in the legacy board-clock registration helper. Fixes: ee15faffef11 ("clk: qcom: common: Add API to register board clocks backwards compatibly") Signed-off-by: Johan Hovold Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/qcom/common.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/clk/qcom/common.c b/drivers/clk/qcom/common.c index d523991c945f..28ceaf1e9937 100644 --- a/drivers/clk/qcom/common.c +++ b/drivers/clk/qcom/common.c @@ -143,8 +143,10 @@ static int _qcom_cc_register_board_clk(struct device *dev, const char *path, int ret; clocks_node = of_find_node_by_path("/clocks"); - if (clocks_node) - node = of_find_node_by_name(clocks_node, path); + if (clocks_node) { + node = of_get_child_by_name(clocks_node, path); + of_node_put(clocks_node); + } if (!node) { fixed = devm_kzalloc(dev, sizeof(*fixed), GFP_KERNEL); From a967ab0f7338e84d379b7a764bf446221de21d3e Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 5 Oct 2017 11:32:59 +0900 Subject: [PATCH 0309/1013] clk: uniphier: fix DAPLL2 clock rate of Pro5 [ Upstream commit 67affb78a4e4feb837953e3434c8402a5c3b272f ] The parent of DAPLL2 should be DAPLL1. Fix the clock connection. Signed-off-by: Masahiro Yamada Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/uniphier/clk-uniphier-sys.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/uniphier/clk-uniphier-sys.c b/drivers/clk/uniphier/clk-uniphier-sys.c index 07f3b91a7daf..d244e724e198 100644 --- a/drivers/clk/uniphier/clk-uniphier-sys.c +++ b/drivers/clk/uniphier/clk-uniphier-sys.c @@ -123,7 +123,7 @@ const struct uniphier_clk_data uniphier_sld8_sys_clk_data[] = { const struct uniphier_clk_data uniphier_pro5_sys_clk_data[] = { UNIPHIER_CLK_FACTOR("spll", -1, "ref", 120, 1), /* 2400 MHz */ UNIPHIER_CLK_FACTOR("dapll1", -1, "ref", 128, 1), /* 2560 MHz */ - UNIPHIER_CLK_FACTOR("dapll2", -1, "ref", 144, 125), /* 2949.12 MHz */ + UNIPHIER_CLK_FACTOR("dapll2", -1, "dapll1", 144, 125), /* 2949.12 MHz */ UNIPHIER_CLK_FACTOR("uart", 0, "dapll2", 1, 40), UNIPHIER_CLK_FACTOR("i2c", 1, "spll", 1, 48), UNIPHIER_PRO5_SYS_CLK_NAND(2), From 8356c5754cb9854a25e0cebac0ee94a352b3debe Mon Sep 17 00:00:00 2001 From: Zhong Kaihua Date: Mon, 7 Aug 2017 22:51:56 +0800 Subject: [PATCH 0310/1013] clk: hi3660: fix incorrect uart3 clock freqency [ Upstream commit d33fb1b9f0fcb67f2b9f8b1891465a088a9480f8 ] UART3 clock rate is doubled in previous commit. This error is not detected until recently a mezzanine board which makes real use of uart3 port (through LS connector of 96boards) was setup and tested on hi3660-hikey960 board. This patch changes clock source rate of clk_factor_uart3 to 100000000. Signed-off-by: Zhong Kaihua Signed-off-by: Guodong Xu Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/hisilicon/clk-hi3660.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/hisilicon/clk-hi3660.c b/drivers/clk/hisilicon/clk-hi3660.c index a18258eb89cb..f40419959656 100644 --- a/drivers/clk/hisilicon/clk-hi3660.c +++ b/drivers/clk/hisilicon/clk-hi3660.c @@ -34,7 +34,7 @@ static const struct hisi_fixed_rate_clock hi3660_fixed_rate_clks[] = { /* crgctrl */ static const struct hisi_fixed_factor_clock hi3660_crg_fixed_factor_clks[] = { - { HI3660_FACTOR_UART3, "clk_factor_uart3", "iomcu_peri0", 1, 8, 0, }, + { HI3660_FACTOR_UART3, "clk_factor_uart3", "iomcu_peri0", 1, 16, 0, }, { HI3660_CLK_FACTOR_MMC, "clk_factor_mmc", "clkin_sys", 1, 6, 0, }, { HI3660_CLK_GATE_I2C0, "clk_gate_i2c0", "clk_i2c0_iomcu", 1, 4, 0, }, { HI3660_CLK_GATE_I2C1, "clk_gate_i2c1", "clk_i2c1_iomcu", 1, 4, 0, }, From e5aa0e86f957709fc72d02ed3d5d0d775a2592f6 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Thu, 28 Sep 2017 11:18:53 +0100 Subject: [PATCH 0311/1013] mailbox: mailbox-test: don't rely on rx_buffer content to signal data ready [ Upstream commit e339c80af95e14de3712d69ddea09a3868fa14cd ] Currently we rely on the first byte of the Rx buffer to check if there's any data available to be read. If the first byte of the received buffer is zero (i.e. null character), then we fail to signal that data is available even when it's available. Instead introduce a boolean variable to track the data availability and update it in the channel receive callback as ready and clear it when the data is read. Signed-off-by: Sudeep Holla Signed-off-by: Jassi Brar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mailbox/mailbox-test.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/mailbox/mailbox-test.c b/drivers/mailbox/mailbox-test.c index 97fb956bb6e0..93f3d4d61fa7 100644 --- a/drivers/mailbox/mailbox-test.c +++ b/drivers/mailbox/mailbox-test.c @@ -30,6 +30,7 @@ #define MBOX_HEXDUMP_MAX_LEN (MBOX_HEXDUMP_LINE_LEN * \ (MBOX_MAX_MSG_LEN / MBOX_BYTES_PER_LINE)) +static bool mbox_data_ready; static struct dentry *root_debugfs_dir; struct mbox_test_device { @@ -152,16 +153,14 @@ static ssize_t mbox_test_message_write(struct file *filp, static bool mbox_test_message_data_ready(struct mbox_test_device *tdev) { - unsigned char data; + bool data_ready; unsigned long flags; spin_lock_irqsave(&tdev->lock, flags); - data = tdev->rx_buffer[0]; + data_ready = mbox_data_ready; spin_unlock_irqrestore(&tdev->lock, flags); - if (data != '\0') - return true; - return false; + return data_ready; } static ssize_t mbox_test_message_read(struct file *filp, char __user *userbuf, @@ -223,6 +222,7 @@ static ssize_t mbox_test_message_read(struct file *filp, char __user *userbuf, *(touser + l) = '\0'; memset(tdev->rx_buffer, 0, MBOX_MAX_MSG_LEN); + mbox_data_ready = false; spin_unlock_irqrestore(&tdev->lock, flags); @@ -292,6 +292,7 @@ static void mbox_test_receive_message(struct mbox_client *client, void *message) message, MBOX_MAX_MSG_LEN); memcpy(tdev->rx_buffer, message, MBOX_MAX_MSG_LEN); } + mbox_data_ready = true; spin_unlock_irqrestore(&tdev->lock, flags); wake_up_interruptible(&tdev->waitq); From 8928998d1ed8c4bf76d0ce75570665982266825c Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Sat, 30 Sep 2017 10:10:09 +0900 Subject: [PATCH 0312/1013] kbuild: rpm-pkg: fix jobserver unavailable warning [ Upstream commit 606625be47bc87b6fab0af10cd57aaa675cb9e42 ] If "make rpm-pkg" or "make binrpm-pkg" is run with -j[jobs] option, the following warning message is displayed. warning: jobserver unavailable: using -j1. Add '+' to parent make rule. Follow the suggestion. Signed-off-by: Masahiro Yamada Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- scripts/package/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/scripts/package/Makefile b/scripts/package/Makefile index 2b4aa4c19b21..34de8b953ecf 100644 --- a/scripts/package/Makefile +++ b/scripts/package/Makefile @@ -49,7 +49,7 @@ rpm-pkg rpm: FORCE $(MAKE) clean $(CONFIG_SHELL) $(MKSPEC) >$(objtree)/kernel.spec $(call cmd,src_tar,$(KERNELPATH),kernel.spec) - rpmbuild $(RPMOPTS) --target $(UTS_MACHINE) -ta $(KERNELPATH).tar.gz + +rpmbuild $(RPMOPTS) --target $(UTS_MACHINE) -ta $(KERNELPATH).tar.gz rm $(KERNELPATH).tar.gz kernel.spec # binrpm-pkg @@ -57,7 +57,7 @@ rpm-pkg rpm: FORCE binrpm-pkg: FORCE $(MAKE) KBUILD_SRC= $(CONFIG_SHELL) $(MKSPEC) prebuilt > $(objtree)/binkernel.spec - rpmbuild $(RPMOPTS) --define "_builddir $(objtree)" --target \ + +rpmbuild $(RPMOPTS) --define "_builddir $(objtree)" --target \ $(UTS_MACHINE) -bb $(objtree)/binkernel.spec rm binkernel.spec From bbcedaeba78040aaa0f0d6a7f39add578f81f356 Mon Sep 17 00:00:00 2001 From: Arvind Yadav Date: Tue, 14 Nov 2017 13:42:38 +0530 Subject: [PATCH 0313/1013] atm: horizon: Fix irq release error [ Upstream commit bde533f2ea607cbbbe76ef8738b36243939a7bc2 ] atm_dev_register() can fail here and passed parameters to free irq which is not initialised. Initialization of 'dev->irq' happened after the 'goto out_free_irq'. So using 'irq' insted of 'dev->irq' in free_irq(). Signed-off-by: Arvind Yadav Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/atm/horizon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/atm/horizon.c b/drivers/atm/horizon.c index 7e76b35f422c..e121b8485731 100644 --- a/drivers/atm/horizon.c +++ b/drivers/atm/horizon.c @@ -2803,7 +2803,7 @@ static int hrz_probe(struct pci_dev *pci_dev, return err; out_free_irq: - free_irq(dev->irq, dev); + free_irq(irq, dev); out_free: kfree(dev); out_release: From 2de359062feea90c1e8baa2fa9771ba98d1fadfa Mon Sep 17 00:00:00 2001 From: Jason Baron Date: Mon, 13 Nov 2017 16:48:47 -0500 Subject: [PATCH 0314/1013] jump_label: Invoke jump_label_test() via early_initcall() [ Upstream commit 92ee46efeb505ead3ab06d3c5ce695637ed5f152 ] Fengguang Wu reported that running the rcuperf test during boot can cause the jump_label_test() to hit a WARN_ON(). The issue is that the core jump label code relies on kernel_text_address() to detect when it can no longer update branches that may be contained in __init sections. The kernel_text_address() in turn assumes that if the system_state variable is greter than or equal to SYSTEM_RUNNING then __init sections are no longer valid (since the assumption is that they have been freed). However, when rcuperf is setup to run in early boot it can call kernel_power_off() which sets the system_state to SYSTEM_POWER_OFF. Since rcuperf initialization is invoked via a module_init(), we can make the dependency of jump_label_test() needing to complete before rcuperf explicit by calling it via early_initcall(). Reported-by: Fengguang Wu Signed-off-by: Jason Baron Acked-by: Paul E. McKenney Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Steven Rostedt Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1510609727-2238-1-git-send-email-jbaron@akamai.com Signed-off-by: Ingo Molnar Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/jump_label.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/jump_label.c b/kernel/jump_label.c index 0bf2e8f5244a..7c3774ac1d51 100644 --- a/kernel/jump_label.c +++ b/kernel/jump_label.c @@ -769,7 +769,7 @@ static __init int jump_label_test(void) return 0; } -late_initcall(jump_label_test); +early_initcall(jump_label_test); #endif /* STATIC_KEYS_SELFTEST */ #endif /* HAVE_JUMP_LABEL */ From f0e1cd056e99c1ac7ec59f46ad152c11f1b69570 Mon Sep 17 00:00:00 2001 From: Ilya Lesokhin Date: Mon, 13 Nov 2017 10:22:44 +0200 Subject: [PATCH 0315/1013] tls: Use kzalloc for aead_request allocation [ Upstream commit 61ef6da622aa7b66bf92991bd272490eea6c712e ] Use kzalloc for aead_request allocation as we don't set all the bits in the request. Fixes: 3c4d7559159b ('tls: kernel TLS support') Signed-off-by: Ilya Lesokhin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/tls/tls_sw.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c index 7d80040a37b6..f00383a37622 100644 --- a/net/tls/tls_sw.c +++ b/net/tls/tls_sw.c @@ -219,7 +219,7 @@ static int tls_do_encryption(struct tls_context *tls_ctx, struct aead_request *aead_req; int rc; - aead_req = kmalloc(req_size, flags); + aead_req = kzalloc(req_size, flags); if (!aead_req) return -ENOMEM; From 6610b9cb80ad9a58f71940b9658e3116262663db Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Fri, 10 Nov 2017 14:14:06 +1100 Subject: [PATCH 0316/1013] xfrm: Copy policy family in clone_policy [ Upstream commit 0e74aa1d79a5bbc663e03a2804399cae418a0321 ] The syzbot found an ancient bug in the IPsec code. When we cloned a socket policy (for example, for a child TCP socket derived from a listening socket), we did not copy the family field. This results in a live policy with a zero family field. This triggers a BUG_ON check in the af_key code when the cloned policy is retrieved. This patch fixes it by copying the family field over. Reported-by: syzbot Signed-off-by: Herbert Xu Signed-off-by: Steffen Klassert Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/xfrm/xfrm_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 6eb228a70131..2a6093840e7e 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1306,6 +1306,7 @@ static struct xfrm_policy *clone_policy(const struct xfrm_policy *old, int dir) newp->xfrm_nr = old->xfrm_nr; newp->index = old->index; newp->type = old->type; + newp->family = old->family; memcpy(newp->xfrm_vec, old->xfrm_vec, newp->xfrm_nr*sizeof(struct xfrm_tmpl)); spin_lock_bh(&net->xfrm.xfrm_policy_lock); From a76d81af17ceadd8bae46da297d8e24dd8ea0838 Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Mon, 13 Nov 2017 17:32:39 +0800 Subject: [PATCH 0317/1013] f2fs: fix to clear FI_NO_PREALLOC [ Upstream commit 28cfafb73853f0494b06649716687a3ea07681d5 ] We need to clear FI_NO_PREALLOC flag in error path of f2fs_file_write_iter, otherwise we will lose the chance to preallocate blocks in latter write() at one time. Fixes: dc91de78e5e1 ("f2fs: do not preallocate blocks which has wrong buffer") Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/f2fs/file.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 6ce467872376..b8372095ba0a 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -2697,6 +2697,7 @@ static ssize_t f2fs_file_write_iter(struct kiocb *iocb, struct iov_iter *from) err = f2fs_preallocate_blocks(iocb, from); if (err) { + clear_inode_flag(inode, FI_NO_PREALLOC); inode_unlock(inode); return err; } From 971110c72991c9c4662d0abf48bd6275b7e211f7 Mon Sep 17 00:00:00 2001 From: Sriharsha Basavapatna Date: Fri, 3 Nov 2017 02:39:04 +0530 Subject: [PATCH 0318/1013] bnxt_re: changing the ip address shouldn't affect new connections [ Upstream commit 063fb5bd1a01937094f40169a20e4aa5ca030db1 ] While adding a new gid, the driver currently does not return the context back to the stack. A subsequent del_gid() (e.g, when ip address is changed) doesn't find the right context in the driver and it ends up dropping that request. This results in the HW caching a stale gid entry and traffic fails because of that. Fix by returning the proper context in bnxt_re_add_gid(). Signed-off-by: Sriharsha Basavapatna Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 0d89621d9fe8..b210495ff33c 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -394,6 +394,7 @@ int bnxt_re_add_gid(struct ib_device *ibdev, u8 port_num, ctx->idx = tbl_idx; ctx->refcnt = 1; ctx_tbl[tbl_idx] = ctx; + *context = ctx; return rc; } From 192c689319429672e15d73969fa0230e7379779a Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Thu, 2 Nov 2017 15:22:26 +0200 Subject: [PATCH 0319/1013] IB/mlx4: Increase maximal message size under UD QP [ Upstream commit 5f22a1d87c5315a98981ecf93cd8de226cffe6ca ] Maximal message should be used as a limit to the max message payload allowed, without the headers. The ConnectX-3 check is done against this value includes the headers. When the payload is 4K this will cause the NIC to drop packets. Increase maximal message to 8K as workaround, this shouldn't change current behaviour because we continue to set the MTU to 4k. To reproduce; set MTU to 4296 on the corresponding interface, for example: ifconfig eth0 mtu 4296 (both server and client) On server: ib_send_bw -c UD -d mlx4_0 -s 4096 -n 1000000 -i1 -m 4096 On client: ib_send_bw -d mlx4_0 -c UD -s 4096 -n 1000000 -i 1 -m 4096 Fixes: 6e0d733d9215 ("IB/mlx4: Allow 4K messages for UD QPs") Signed-off-by: Mark Bloch Reviewed-by: Majd Dibbiny Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx4/qp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index b6b33d99b0b4..17e44c86577a 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -2216,7 +2216,7 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, context->mtu_msgmax = (IB_MTU_4096 << 5) | ilog2(dev->dev->caps.max_gso_sz); else - context->mtu_msgmax = (IB_MTU_4096 << 5) | 12; + context->mtu_msgmax = (IB_MTU_4096 << 5) | 13; } else if (attr_mask & IB_QP_PATH_MTU) { if (attr->path_mtu < IB_MTU_256 || attr->path_mtu > IB_MTU_4096) { pr_err("path MTU (%u) is invalid\n", From d702be100eaf043b97687c050565c154a48d763d Mon Sep 17 00:00:00 2001 From: Majd Dibbiny Date: Mon, 30 Oct 2017 14:23:13 +0200 Subject: [PATCH 0320/1013] IB/mlx5: Assign send CQ and recv CQ of UMR QP [ Upstream commit 31fde034a8bd964a5c7c1a5663fc87a913158db2 ] The UMR's QP is created by calling mlx5_ib_create_qp directly, and therefore the send CQ and the recv CQ on the ibqp weren't assigned. Assign them right after calling the mlx5_ib_create_qp to assure that any access to those pointers will work as expected and won't crash the system as might happen as part of reset flow. Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Majd Dibbiny Reviewed-by: Yishai Hadas Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx5/main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 552f7bd4ecc3..5aff1e33d984 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -3097,6 +3097,8 @@ static int create_umr_res(struct mlx5_ib_dev *dev) qp->real_qp = qp; qp->uobject = NULL; qp->qp_type = MLX5_IB_QPT_REG_UMR; + qp->send_cq = init_attr->send_cq; + qp->recv_cq = init_attr->recv_cq; attr->qp_state = IB_QPS_INIT; attr->port_num = 1; From 49e186e3278d0d800021b59286dc97d748593fd6 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 2 Nov 2017 15:27:51 +0000 Subject: [PATCH 0321/1013] afs: Fix total-length calculation for multiple-page send [ Upstream commit 1199db603511d7463d9d3840f96f61967affc766 ] Fix the total-length calculation in afs_make_call() when the operation being dispatched has data from a series of pages attached. Despite the patched code looking like that it should reduce mathematically to the current code, it doesn't because the 32-bit unsigned arithmetic being used to calculate the page-offset-difference doesn't correctly extend to a 64-bit value when the result is effectively negative. Without this, some FS.StoreData operations that span multiple pages fail, reporting too little or too much data. Signed-off-by: David Howells Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/afs/rxrpc.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c index 0bf191f0dbaf..9f715c3edcf9 100644 --- a/fs/afs/rxrpc.c +++ b/fs/afs/rxrpc.c @@ -377,8 +377,17 @@ int afs_make_call(struct in_addr *addr, struct afs_call *call, gfp_t gfp, */ tx_total_len = call->request_size; if (call->send_pages) { - tx_total_len += call->last_to - call->first_offset; - tx_total_len += (call->last - call->first) * PAGE_SIZE; + if (call->last == call->first) { + tx_total_len += call->last_to - call->first_offset; + } else { + /* It looks mathematically like you should be able to + * combine the following lines with the ones above, but + * unsigned arithmetic is fun when it wraps... + */ + tx_total_len += PAGE_SIZE - call->first_offset; + tx_total_len += call->last_to; + tx_total_len += (call->last - call->first - 1) * PAGE_SIZE; + } } /* create a call */ From 80e642c066f9882007174be870c1465f69df8257 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 2 Nov 2017 15:27:48 +0000 Subject: [PATCH 0322/1013] afs: Connect up the CB.ProbeUuid [ Upstream commit f4b3526d83c40dd8bf5948b9d7a1b2c340f0dcc8 ] The handler for the CB.ProbeUuid operation in the cache manager is implemented, but isn't listed in the switch-statement of operation selection, so won't be used. Fix this by adding it. Signed-off-by: David Howells Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/afs/cmservice.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index 782d4d05a53b..c7475867a52b 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -127,6 +127,9 @@ bool afs_cm_incoming_call(struct afs_call *call) case CBProbe: call->type = &afs_SRXCBProbe; return true; + case CBProbeUuid: + call->type = &afs_SRXCBProbeUuid; + return true; case CBTellMeAboutYourself: call->type = &afs_SRXCBTellMeAboutYourself; return true; From 5fd159e1ee6a87a72626139813034f24f047d0e6 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 14 Dec 2017 09:53:15 +0100 Subject: [PATCH 0323/1013] Linux 4.14.6 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 43ac7bdb10ad..eabbd7748a24 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 5 +SUBLEVEL = 6 EXTRAVERSION = NAME = Petit Gorille From fc038e59f316d2255f6d3c05d0d6b117d4739574 Mon Sep 17 00:00:00 2001 From: Sebastian Sjoholm Date: Mon, 20 Nov 2017 19:05:17 +0100 Subject: [PATCH 0324/1013] net: qmi_wwan: add Quectel BG96 2c7c:0296 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit f9409e7f086fa6c4623769b4b2f4f17a024d8143 ] Quectel BG96 is an Qualcomm MDM9206 based IoT modem, supporting both CAT-M and NB-IoT. Tested hardware is BG96 mounted on Quectel development board (EVB). The USB id is added to qmi_wwan.c to allow QMI communication with the BG96. Signed-off-by: Sebastian Sjoholm Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 8d4a6f7cba61..f4ed553929f0 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1239,6 +1239,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1e0e, 0x9001, 5)}, /* SIMCom 7230E */ {QMI_QUIRK_SET_DTR(0x2c7c, 0x0125, 4)}, /* Quectel EC25, EC20 R2.0 Mini PCIe */ {QMI_QUIRK_SET_DTR(0x2c7c, 0x0121, 4)}, /* Quectel EC21 Mini PCIe */ + {QMI_FIXED_INTF(0x2c7c, 0x0296, 4)}, /* Quectel BG96 */ /* 4. Gobi 1000 devices */ {QMI_GOBI1K_DEVICE(0x05c6, 0x9212)}, /* Acer Gobi Modem Device */ From fb89f5b05ab239ea14f87c02fcac05c957818f90 Mon Sep 17 00:00:00 2001 From: Sunil Goutham Date: Thu, 23 Nov 2017 22:34:31 +0300 Subject: [PATCH 0325/1013] net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts [ Upstream commit fa6d7cb5d76cf0467c61420fc9238045aedfd379 ] Don't offload IP header checksum to NIC. This fixes a previous patch which enabled checksum offloading for both IPv4 and IPv6 packets. So L3 checksum offload was getting enabled for IPv6 pkts. And HW is dropping these pkts as it assumes the pkt is IPv4 when IP csum offload is set in the SQ descriptor. Fixes: 3a9024f52c2e ("net: thunderx: Enable TSO and checksum offloads for ipv6") Signed-off-by: Sunil Goutham Signed-off-by: Aleksey Makarov Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c index d4496e9afcdf..8b2c31e2a2b0 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -1355,7 +1355,6 @@ nicvf_sq_add_hdr_subdesc(struct nicvf *nic, struct snd_queue *sq, int qentry, /* Offload checksum calculation to HW */ if (skb->ip_summed == CHECKSUM_PARTIAL) { - hdr->csum_l3 = 1; /* Enable IP csum calculation */ hdr->l3_offset = skb_network_offset(skb); hdr->l4_offset = skb_transport_offset(skb); From ec9a6722173d3dbbbbcc5eec2f5b0f2c58057bc9 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 6 Dec 2017 01:04:50 +0100 Subject: [PATCH 0326/1013] net: thunderx: Fix TCP/UDP checksum offload for IPv4 pkts [ Upstream commit 134059fd2775be79e26c2dff87d25cc2f6ea5626 ] Offload IP header checksum to NIC. This fixes a previous patch which disabled checksum offloading for both IPv4 and IPv6 packets. So L3 checksum offload was getting disabled for IPv4 pkts. And HW is dropping these pkts for some reason. Without this patch, IPv4 TSO appears to be broken: WIthout this patch I get ~16kbyte/s, with patch close to 2mbyte/s when copying files via scp from test box to my home workstation. Looking at tcpdump on sender it looks like hardware drops IPv4 TSO skbs. This patch restores performance for me, ipv6 looks good too. Fixes: fa6d7cb5d76c ("net: thunderx: Fix TCP/UDP checksum offload for IPv6 pkts") Cc: Sunil Goutham Cc: Aleksey Makarov Cc: Eric Dumazet Signed-off-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/cavium/thunder/nicvf_queues.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c index 8b2c31e2a2b0..a3d12dbde95b 100644 --- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c +++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c @@ -1355,6 +1355,8 @@ nicvf_sq_add_hdr_subdesc(struct nicvf *nic, struct snd_queue *sq, int qentry, /* Offload checksum calculation to HW */ if (skb->ip_summed == CHECKSUM_PARTIAL) { + if (ip.v4->version == 4) + hdr->csum_l3 = 1; /* Enable IP csum calculation */ hdr->l3_offset = skb_network_offset(skb); hdr->l4_offset = skb_transport_offset(skb); From c7203f55d53f9237bda0c0b6cf02f22f18262a5b Mon Sep 17 00:00:00 2001 From: Tobias Jakobi Date: Tue, 21 Nov 2017 16:15:57 +0100 Subject: [PATCH 0327/1013] net: realtek: r8169: implement set_link_ksettings() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 9e77d7a5549dc4d4999a60676373ab3fd1dae4db ] Commit 6fa1ba61520576cf1346c4ff09a056f2950cb3bf partially implemented the new ethtool API, by replacing get_settings() with get_link_ksettings(). This breaks ethtool, since the userspace tool (according to the new API specs) never tries the legacy set() call, when the new get() call succeeds. All attempts to chance some setting from userspace result in: > Cannot set new settings: Operation not supported Implement the missing set() call. Signed-off-by: Tobias Jakobi Tested-by: Holger Hoffstätte Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/realtek/r8169.c | 38 ++++++++++++++++------------ 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index a3c949ea7d1a..9541465e43e9 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -2025,21 +2025,6 @@ static int rtl8169_set_speed(struct net_device *dev, return ret; } -static int rtl8169_set_settings(struct net_device *dev, struct ethtool_cmd *cmd) -{ - struct rtl8169_private *tp = netdev_priv(dev); - int ret; - - del_timer_sync(&tp->timer); - - rtl_lock_work(tp); - ret = rtl8169_set_speed(dev, cmd->autoneg, ethtool_cmd_speed(cmd), - cmd->duplex, cmd->advertising); - rtl_unlock_work(tp); - - return ret; -} - static netdev_features_t rtl8169_fix_features(struct net_device *dev, netdev_features_t features) { @@ -2166,6 +2151,27 @@ static int rtl8169_get_link_ksettings(struct net_device *dev, return rc; } +static int rtl8169_set_link_ksettings(struct net_device *dev, + const struct ethtool_link_ksettings *cmd) +{ + struct rtl8169_private *tp = netdev_priv(dev); + int rc; + u32 advertising; + + if (!ethtool_convert_link_mode_to_legacy_u32(&advertising, + cmd->link_modes.advertising)) + return -EINVAL; + + del_timer_sync(&tp->timer); + + rtl_lock_work(tp); + rc = rtl8169_set_speed(dev, cmd->base.autoneg, cmd->base.speed, + cmd->base.duplex, advertising); + rtl_unlock_work(tp); + + return rc; +} + static void rtl8169_get_regs(struct net_device *dev, struct ethtool_regs *regs, void *p) { @@ -2367,7 +2373,6 @@ static const struct ethtool_ops rtl8169_ethtool_ops = { .get_drvinfo = rtl8169_get_drvinfo, .get_regs_len = rtl8169_get_regs_len, .get_link = ethtool_op_get_link, - .set_settings = rtl8169_set_settings, .get_msglevel = rtl8169_get_msglevel, .set_msglevel = rtl8169_set_msglevel, .get_regs = rtl8169_get_regs, @@ -2379,6 +2384,7 @@ static const struct ethtool_ops rtl8169_ethtool_ops = { .get_ts_info = ethtool_op_get_ts_info, .nway_reset = rtl8169_nway_reset, .get_link_ksettings = rtl8169_get_link_ksettings, + .set_link_ksettings = rtl8169_set_link_ksettings, }; static void rtl8169_get_mac_version(struct rtl8169_private *tp, From 6efcd7eada3e76096e574e27d4e56051ffd328ab Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Wed, 18 Oct 2017 17:40:17 +0200 Subject: [PATCH 0328/1013] s390/qeth: fix early exit from error path [ Upstream commit 83cf79a2fec3cf499eb6cb9eb608656fc2a82776 ] When the allocation of the addr buffer fails, we need to free our refcount on the inetdevice before returning. Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_l3_main.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index ab661a431f7c..1d8c317553ba 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -1553,7 +1553,7 @@ static void qeth_l3_free_vlan_addresses4(struct qeth_card *card, addr = qeth_l3_get_addr_buffer(QETH_PROT_IPV4); if (!addr) - return; + goto out; spin_lock_bh(&card->ip_lock); @@ -1567,6 +1567,7 @@ static void qeth_l3_free_vlan_addresses4(struct qeth_card *card, spin_unlock_bh(&card->ip_lock); kfree(addr); +out: in_dev_put(in_dev); } @@ -1591,7 +1592,7 @@ static void qeth_l3_free_vlan_addresses6(struct qeth_card *card, addr = qeth_l3_get_addr_buffer(QETH_PROT_IPV6); if (!addr) - return; + goto out; spin_lock_bh(&card->ip_lock); @@ -1606,6 +1607,7 @@ static void qeth_l3_free_vlan_addresses6(struct qeth_card *card, spin_unlock_bh(&card->ip_lock); kfree(addr); +out: in6_dev_put(in6_dev); #endif /* CONFIG_QETH_IPV6 */ } From 1933fa485194e697b4d90853f338029ab19e4a72 Mon Sep 17 00:00:00 2001 From: Jon Maloy Date: Mon, 4 Dec 2017 22:00:20 +0100 Subject: [PATCH 0329/1013] tipc: fix memory leak in tipc_accept_from_sock() [ Upstream commit a7d5f107b4978e08eeab599ee7449af34d034053 ] When the function tipc_accept_from_sock() fails to create an instance of struct tipc_subscriber it omits to free the already created instance of struct tipc_conn instance before it returns. We fix that with this commit. Reported-by: David S. Miller Signed-off-by: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/server.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/tipc/server.c b/net/tipc/server.c index 3cd6402e812c..f4c1b18c5fb0 100644 --- a/net/tipc/server.c +++ b/net/tipc/server.c @@ -313,6 +313,7 @@ static int tipc_accept_from_sock(struct tipc_conn *con) newcon->usr_data = s->tipc_conn_new(newcon->conid); if (!newcon->usr_data) { sock_release(newsock); + conn_put(newcon); return -ENOMEM; } From dee5b428c3b71d7201d49afd38a110fccb352922 Mon Sep 17 00:00:00 2001 From: Wei Xu Date: Fri, 1 Dec 2017 05:10:36 -0500 Subject: [PATCH 0330/1013] vhost: fix skb leak in handle_rx() [ Upstream commit 6e474083f3daf3a3546737f5d7d502ad12eb257c ] Matthew found a roughly 40% tcp throughput regression with commit c67df11f(vhost_net: try batch dequing from skb array) as discussed in the following thread: https://www.mail-archive.com/netdev@vger.kernel.org/msg187936.html Eventually we figured out that it was a skb leak in handle_rx() when sending packets to the VM. This usually happens when a guest can not drain out vq as fast as vhost fills in, afterwards it sets off the traffic jam and leaks skb(s) which occurs as no headcount to send on the vq from vhost side. This can be avoided by making sure we have got enough headcount before actually consuming a skb from the batched rx array while transmitting, which is simply done by moving checking the zero headcount a bit ahead. Signed-off-by: Wei Xu Reported-by: Matthew Rosato Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/vhost/net.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 58585ec8699e..bd15309ac5f1 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -782,16 +782,6 @@ static void handle_rx(struct vhost_net *net) /* On error, stop handling until the next kick. */ if (unlikely(headcount < 0)) goto out; - if (nvq->rx_array) - msg.msg_control = vhost_net_buf_consume(&nvq->rxq); - /* On overrun, truncate and discard */ - if (unlikely(headcount > UIO_MAXIOV)) { - iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); - err = sock->ops->recvmsg(sock, &msg, - 1, MSG_DONTWAIT | MSG_TRUNC); - pr_debug("Discarded rx packet: len %zd\n", sock_len); - continue; - } /* OK, now we need to know about added descriptors. */ if (!headcount) { if (unlikely(vhost_enable_notify(&net->dev, vq))) { @@ -804,6 +794,16 @@ static void handle_rx(struct vhost_net *net) * they refilled. */ goto out; } + if (nvq->rx_array) + msg.msg_control = vhost_net_buf_consume(&nvq->rxq); + /* On overrun, truncate and discard */ + if (unlikely(headcount > UIO_MAXIOV)) { + iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1); + err = sock->ops->recvmsg(sock, &msg, + 1, MSG_DONTWAIT | MSG_TRUNC); + pr_debug("Discarded rx packet: len %zd\n", sock_len); + continue; + } /* We don't need to be notified again. */ iov_iter_init(&msg.msg_iter, READ, vq->iov, in, vhost_len); fixup = msg.msg_iter; From 9d9a63d74b2b6ed7c30c7d1584d87c16ae8d5862 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?H=C3=A5kon=20Bugge?= Date: Wed, 6 Dec 2017 17:18:28 +0100 Subject: [PATCH 0331/1013] rds: Fix NULL pointer dereference in __rds_rdma_map MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit f3069c6d33f6ae63a1668737bc78aaaa51bff7ca ] This is a fix for syzkaller719569, where memory registration was attempted without any underlying transport being loaded. Analysis of the case reveals that it is the setsockopt() RDS_GET_MR (2) and RDS_GET_MR_FOR_DEST (7) that are vulnerable. Here is an example stack trace when the bug is hit: BUG: unable to handle kernel NULL pointer dereference at 00000000000000c0 IP: __rds_rdma_map+0x36/0x440 [rds] PGD 2f93d03067 P4D 2f93d03067 PUD 2f93d02067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: bridge stp llc tun rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache rds binfmt_misc sb_edac intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul c rc32_pclmul ghash_clmulni_intel pcbc aesni_intel crypto_simd glue_helper cryptd iTCO_wdt mei_me sg iTCO_vendor_support ipmi_si mei ipmi_devintf nfsd shpchp pcspkr i2c_i801 ioatd ma ipmi_msghandler wmi lpc_ich mfd_core auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ext4 mbcache jbd2 mgag200 i2c_algo_bit drm_kms_helper ixgbe syscopyarea ahci sysfillrect sysimgblt libahci mdio fb_sys_fops ttm ptp libata sd_mod mlx4_core drm crc32c_intel pps_core megaraid_sas i2c_core dca dm_mirror dm_region_hash dm_log dm_mod CPU: 48 PID: 45787 Comm: repro_set2 Not tainted 4.14.2-3.el7uek.x86_64 #2 Hardware name: Oracle Corporation ORACLE SERVER X5-2L/ASM,MOBO TRAY,2U, BIOS 31110000 03/03/2017 task: ffff882f9190db00 task.stack: ffffc9002b994000 RIP: 0010:__rds_rdma_map+0x36/0x440 [rds] RSP: 0018:ffffc9002b997df0 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff882fa2182580 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffc9002b997e40 RDI: ffff882fa2182580 RBP: ffffc9002b997e30 R08: 0000000000000000 R09: 0000000000000002 R10: ffff885fb29e3838 R11: 0000000000000000 R12: ffff882fa2182580 R13: ffff882fa2182580 R14: 0000000000000002 R15: 0000000020000ffc FS: 00007fbffa20b700(0000) GS:ffff882fbfb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000c0 CR3: 0000002f98a66006 CR4: 00000000001606e0 Call Trace: rds_get_mr+0x56/0x80 [rds] rds_setsockopt+0x172/0x340 [rds] ? __fget_light+0x25/0x60 ? __fdget+0x13/0x20 SyS_setsockopt+0x80/0xe0 do_syscall_64+0x67/0x1b0 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7fbff9b117f9 RSP: 002b:00007fbffa20aed8 EFLAGS: 00000293 ORIG_RAX: 0000000000000036 RAX: ffffffffffffffda RBX: 00000000000c84a4 RCX: 00007fbff9b117f9 RDX: 0000000000000002 RSI: 0000400000000114 RDI: 000000000000109b RBP: 00007fbffa20af10 R08: 0000000000000020 R09: 00007fbff9dd7860 R10: 0000000020000ffc R11: 0000000000000293 R12: 0000000000000000 R13: 00007fbffa20b9c0 R14: 00007fbffa20b700 R15: 0000000000000021 Code: 41 56 41 55 49 89 fd 41 54 53 48 83 ec 18 8b 87 f0 02 00 00 48 89 55 d0 48 89 4d c8 85 c0 0f 84 2d 03 00 00 48 8b 87 00 03 00 00 <48> 83 b8 c0 00 00 00 00 0f 84 25 03 00 0 0 48 8b 06 48 8b 56 08 The fix is to check the existence of an underlying transport in __rds_rdma_map(). Signed-off-by: Håkon Bugge Reported-by: syzbot Acked-by: Santosh Shilimkar Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/rdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/rds/rdma.c b/net/rds/rdma.c index 8886f15abe90..bc2f1e0977d6 100644 --- a/net/rds/rdma.c +++ b/net/rds/rdma.c @@ -183,7 +183,7 @@ static int __rds_rdma_map(struct rds_sock *rs, struct rds_get_mr_args *args, long i; int ret; - if (rs->rs_bound_addr == 0) { + if (rs->rs_bound_addr == 0 || !rs->rs_transport) { ret = -ENOTCONN; /* XXX not a great errno */ goto out; } From dacf127383fc2dabd2befbead91cbbf22c3dc2ab Mon Sep 17 00:00:00 2001 From: Hangbin Liu Date: Thu, 30 Nov 2017 10:41:14 +0800 Subject: [PATCH 0332/1013] sit: update frag_off info [ Upstream commit f859b4af1c52493ec21173ccc73d0b60029b5b88 ] After parsing the sit netlink change info, we forget to update frag_off in ipip6_tunnel_update(). Fix it by assigning frag_off with new value. Reported-by: Jianlin Shi Signed-off-by: Hangbin Liu Acked-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/sit.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c index ac912bb21747..e79854cc5790 100644 --- a/net/ipv6/sit.c +++ b/net/ipv6/sit.c @@ -1087,6 +1087,7 @@ static void ipip6_tunnel_update(struct ip_tunnel *t, struct ip_tunnel_parm *p, ipip6_tunnel_link(sitn, t); t->parms.iph.ttl = p->iph.ttl; t->parms.iph.tos = p->iph.tos; + t->parms.iph.frag_off = p->iph.frag_off; if (t->parms.link != p->link || t->fwmark != fwmark) { t->parms.link = p->link; t->fwmark = fwmark; From b604eb8deaed5be43c0402008d0a76cedaacf940 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sun, 3 Dec 2017 09:32:59 -0800 Subject: [PATCH 0333/1013] tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb() [ Upstream commit eeea10b83a139451130df1594f26710c8fa390c8 ] James Morris reported kernel stack corruption bug [1] while running the SELinux testsuite, and bisected to a recent commit bffa72cf7f9d ("net: sk_buff rbnode reorg") We believe this commit is fine, but exposes an older bug. SELinux code runs from tcp_filter() and might send an ICMP, expecting IP options to be found in skb->cb[] using regular IPCB placement. We need to defer TCP mangling of skb->cb[] after tcp_filter() calls. This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very similar way we added them for IPv6. [1] [ 339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet [ 339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5 [ 339.822505] [ 339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15 [ 339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A 01/19/2017 [ 339.885060] Call Trace: [ 339.896875] [ 339.908103] dump_stack+0x63/0x87 [ 339.920645] panic+0xe8/0x248 [ 339.932668] ? ip_push_pending_frames+0x33/0x40 [ 339.946328] ? icmp_send+0x525/0x530 [ 339.958861] ? kfree_skbmem+0x60/0x70 [ 339.971431] __stack_chk_fail+0x1b/0x20 [ 339.984049] icmp_send+0x525/0x530 [ 339.996205] ? netlbl_skbuff_err+0x36/0x40 [ 340.008997] ? selinux_netlbl_err+0x11/0x20 [ 340.021816] ? selinux_socket_sock_rcv_skb+0x211/0x230 [ 340.035529] ? security_sock_rcv_skb+0x3b/0x50 [ 340.048471] ? sk_filter_trim_cap+0x44/0x1c0 [ 340.061246] ? tcp_v4_inbound_md5_hash+0x69/0x1b0 [ 340.074562] ? tcp_filter+0x2c/0x40 [ 340.086400] ? tcp_v4_rcv+0x820/0xa20 [ 340.098329] ? ip_local_deliver_finish+0x71/0x1a0 [ 340.111279] ? ip_local_deliver+0x6f/0xe0 [ 340.123535] ? ip_rcv_finish+0x3a0/0x3a0 [ 340.135523] ? ip_rcv_finish+0xdb/0x3a0 [ 340.147442] ? ip_rcv+0x27c/0x3c0 [ 340.158668] ? inet_del_offload+0x40/0x40 [ 340.170580] ? __netif_receive_skb_core+0x4ac/0x900 [ 340.183285] ? rcu_accelerate_cbs+0x5b/0x80 [ 340.195282] ? __netif_receive_skb+0x18/0x60 [ 340.207288] ? process_backlog+0x95/0x140 [ 340.218948] ? net_rx_action+0x26c/0x3b0 [ 340.230416] ? __do_softirq+0xc9/0x26a [ 340.241625] ? do_softirq_own_stack+0x2a/0x40 [ 340.253368] [ 340.262673] ? do_softirq+0x50/0x60 [ 340.273450] ? __local_bh_enable_ip+0x57/0x60 [ 340.285045] ? ip_finish_output2+0x175/0x350 [ 340.296403] ? ip_finish_output+0x127/0x1d0 [ 340.307665] ? nf_hook_slow+0x3c/0xb0 [ 340.318230] ? ip_output+0x72/0xe0 [ 340.328524] ? ip_fragment.constprop.54+0x80/0x80 [ 340.340070] ? ip_local_out+0x35/0x40 [ 340.350497] ? ip_queue_xmit+0x15c/0x3f0 [ 340.361060] ? __kmalloc_reserve.isra.40+0x31/0x90 [ 340.372484] ? __skb_clone+0x2e/0x130 [ 340.382633] ? tcp_transmit_skb+0x558/0xa10 [ 340.393262] ? tcp_connect+0x938/0xad0 [ 340.403370] ? ktime_get_with_offset+0x4c/0xb0 [ 340.414206] ? tcp_v4_connect+0x457/0x4e0 [ 340.424471] ? __inet_stream_connect+0xb3/0x300 [ 340.435195] ? inet_stream_connect+0x3b/0x60 [ 340.445607] ? SYSC_connect+0xd9/0x110 [ 340.455455] ? __audit_syscall_entry+0xaf/0x100 [ 340.466112] ? syscall_trace_enter+0x1d0/0x2b0 [ 340.476636] ? __audit_syscall_exit+0x209/0x290 [ 340.487151] ? SyS_connect+0xe/0x10 [ 340.496453] ? do_syscall_64+0x67/0x1b0 [ 340.506078] ? entry_SYSCALL64_slow_path+0x25/0x25 Fixes: 971f10eca186 ("tcp: better TCP_SKB_CB layout to reduce cache line misses") Signed-off-by: Eric Dumazet Reported-by: James Morris Tested-by: James Morris Tested-by: Casey Schaufler Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_ipv4.c | 59 ++++++++++++++++++++++++++++++--------------- net/ipv6/tcp_ipv6.c | 10 +++++--- 2 files changed, 46 insertions(+), 23 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 5b027c69cbc5..5a5ed4f14678 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -1587,6 +1587,34 @@ int tcp_filter(struct sock *sk, struct sk_buff *skb) } EXPORT_SYMBOL(tcp_filter); +static void tcp_v4_restore_cb(struct sk_buff *skb) +{ + memmove(IPCB(skb), &TCP_SKB_CB(skb)->header.h4, + sizeof(struct inet_skb_parm)); +} + +static void tcp_v4_fill_cb(struct sk_buff *skb, const struct iphdr *iph, + const struct tcphdr *th) +{ + /* This is tricky : We move IPCB at its correct location into TCP_SKB_CB() + * barrier() makes sure compiler wont play fool^Waliasing games. + */ + memmove(&TCP_SKB_CB(skb)->header.h4, IPCB(skb), + sizeof(struct inet_skb_parm)); + barrier(); + + TCP_SKB_CB(skb)->seq = ntohl(th->seq); + TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + + skb->len - th->doff * 4); + TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); + TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th); + TCP_SKB_CB(skb)->tcp_tw_isn = 0; + TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph); + TCP_SKB_CB(skb)->sacked = 0; + TCP_SKB_CB(skb)->has_rxtstamp = + skb->tstamp || skb_hwtstamps(skb)->hwtstamp; +} + /* * From tcp_input.c */ @@ -1627,24 +1655,6 @@ int tcp_v4_rcv(struct sk_buff *skb) th = (const struct tcphdr *)skb->data; iph = ip_hdr(skb); - /* This is tricky : We move IPCB at its correct location into TCP_SKB_CB() - * barrier() makes sure compiler wont play fool^Waliasing games. - */ - memmove(&TCP_SKB_CB(skb)->header.h4, IPCB(skb), - sizeof(struct inet_skb_parm)); - barrier(); - - TCP_SKB_CB(skb)->seq = ntohl(th->seq); - TCP_SKB_CB(skb)->end_seq = (TCP_SKB_CB(skb)->seq + th->syn + th->fin + - skb->len - th->doff * 4); - TCP_SKB_CB(skb)->ack_seq = ntohl(th->ack_seq); - TCP_SKB_CB(skb)->tcp_flags = tcp_flag_byte(th); - TCP_SKB_CB(skb)->tcp_tw_isn = 0; - TCP_SKB_CB(skb)->ip_dsfield = ipv4_get_dsfield(iph); - TCP_SKB_CB(skb)->sacked = 0; - TCP_SKB_CB(skb)->has_rxtstamp = - skb->tstamp || skb_hwtstamps(skb)->hwtstamp; - lookup: sk = __inet_lookup_skb(&tcp_hashinfo, skb, __tcp_hdrlen(th), th->source, th->dest, sdif, &refcounted); @@ -1675,14 +1685,19 @@ int tcp_v4_rcv(struct sk_buff *skb) sock_hold(sk); refcounted = true; nsk = NULL; - if (!tcp_filter(sk, skb)) + if (!tcp_filter(sk, skb)) { + th = (const struct tcphdr *)skb->data; + iph = ip_hdr(skb); + tcp_v4_fill_cb(skb, iph, th); nsk = tcp_check_req(sk, skb, req, false); + } if (!nsk) { reqsk_put(req); goto discard_and_relse; } if (nsk == sk) { reqsk_put(req); + tcp_v4_restore_cb(skb); } else if (tcp_child_process(sk, nsk, skb)) { tcp_v4_send_reset(nsk, skb); goto discard_and_relse; @@ -1708,6 +1723,7 @@ int tcp_v4_rcv(struct sk_buff *skb) goto discard_and_relse; th = (const struct tcphdr *)skb->data; iph = ip_hdr(skb); + tcp_v4_fill_cb(skb, iph, th); skb->dev = NULL; @@ -1738,6 +1754,8 @@ int tcp_v4_rcv(struct sk_buff *skb) if (!xfrm4_policy_check(NULL, XFRM_POLICY_IN, skb)) goto discard_it; + tcp_v4_fill_cb(skb, iph, th); + if (tcp_checksum_complete(skb)) { csum_error: __TCP_INC_STATS(net, TCP_MIB_CSUMERRORS); @@ -1764,6 +1782,8 @@ int tcp_v4_rcv(struct sk_buff *skb) goto discard_it; } + tcp_v4_fill_cb(skb, iph, th); + if (tcp_checksum_complete(skb)) { inet_twsk_put(inet_twsk(sk)); goto csum_error; @@ -1780,6 +1800,7 @@ int tcp_v4_rcv(struct sk_buff *skb) if (sk2) { inet_twsk_deschedule_put(inet_twsk(sk)); sk = sk2; + tcp_v4_restore_cb(skb); refcounted = false; goto process; } diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 64d94afa427f..067350ac843a 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1448,7 +1448,6 @@ static int tcp_v6_rcv(struct sk_buff *skb) struct sock *nsk; sk = req->rsk_listener; - tcp_v6_fill_cb(skb, hdr, th); if (tcp_v6_inbound_md5_hash(sk, skb)) { sk_drops_add(sk, skb); reqsk_put(req); @@ -1461,8 +1460,12 @@ static int tcp_v6_rcv(struct sk_buff *skb) sock_hold(sk); refcounted = true; nsk = NULL; - if (!tcp_filter(sk, skb)) + if (!tcp_filter(sk, skb)) { + th = (const struct tcphdr *)skb->data; + hdr = ipv6_hdr(skb); + tcp_v6_fill_cb(skb, hdr, th); nsk = tcp_check_req(sk, skb, req, false); + } if (!nsk) { reqsk_put(req); goto discard_and_relse; @@ -1486,8 +1489,6 @@ static int tcp_v6_rcv(struct sk_buff *skb) if (!xfrm6_policy_check(sk, XFRM_POLICY_IN, skb)) goto discard_and_relse; - tcp_v6_fill_cb(skb, hdr, th); - if (tcp_v6_inbound_md5_hash(sk, skb)) goto discard_and_relse; @@ -1495,6 +1496,7 @@ static int tcp_v6_rcv(struct sk_buff *skb) goto discard_and_relse; th = (const struct tcphdr *)skb->data; hdr = ipv6_hdr(skb); + tcp_v6_fill_cb(skb, hdr, th); skb->dev = NULL; From 7263f11b567193235a676584dfefd0b46c8b08a1 Mon Sep 17 00:00:00 2001 From: Mike Maloney Date: Tue, 28 Nov 2017 10:44:29 -0500 Subject: [PATCH 0334/1013] packet: fix crash in fanout_demux_rollover() syzkaller found a race condition fanout_demux_rollover() while removing a packet socket from a fanout group. po->rollover is read and operated on during packet_rcv_fanout(), via fanout_demux_rollover(), but the pointer is currently cleared before the synchronization in packet_release(). It is safer to delay the cleanup until after synchronize_net() has been called, ensuring all calls to packet_rcv_fanout() for this socket have finished. To further simplify synchronization around the rollover structure, set po->rollover in fanout_add() only if there are no errors. This removes the need for rcu in the struct and in the call to packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...). Crashing stack trace: fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392 packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487 dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953 xmit_one net/core/dev.c:2975 [inline] dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995 __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476 dev_queue_xmit+0x17/0x20 net/core/dev.c:3509 neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379 neigh_output include/net/neighbour.h:482 [inline] ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120 ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146 NF_HOOK_COND include/linux/netfilter.h:239 [inline] ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163 dst_output include/net/dst.h:459 [inline] NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250 mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660 mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072 mld_send_initial_cr net/ipv6/mcast.c:2056 [inline] ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079 addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039 addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971 process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113 worker_thread+0x223/0x1990 kernel/workqueue.c:2247 kthread+0x35e/0x430 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432 Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state") Fixes: 509c7a1ecc860 ("packet: avoid panic in packet_getsockopt()") Reported-by: syzbot Signed-off-by: Mike Maloney Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 32 ++++++++++---------------------- net/packet/internal.h | 1 - 2 files changed, 10 insertions(+), 23 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 2986941164b1..2eb4f1e5b861 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1697,7 +1697,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) atomic_long_set(&rollover->num, 0); atomic_long_set(&rollover->num_huge, 0); atomic_long_set(&rollover->num_failed, 0); - po->rollover = rollover; } if (type_flags & PACKET_FANOUT_FLAG_UNIQUEID) { @@ -1755,6 +1754,8 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) if (refcount_read(&match->sk_ref) < PACKET_FANOUT_MAX) { __dev_remove_pack(&po->prot_hook); po->fanout = match; + po->rollover = rollover; + rollover = NULL; refcount_set(&match->sk_ref, refcount_read(&match->sk_ref) + 1); __fanout_link(sk, po); err = 0; @@ -1768,10 +1769,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 type_flags) } out: - if (err && rollover) { - kfree_rcu(rollover, rcu); - po->rollover = NULL; - } + kfree(rollover); mutex_unlock(&fanout_mutex); return err; } @@ -1795,11 +1793,6 @@ static struct packet_fanout *fanout_release(struct sock *sk) list_del(&f->list); else f = NULL; - - if (po->rollover) { - kfree_rcu(po->rollover, rcu); - po->rollover = NULL; - } } mutex_unlock(&fanout_mutex); @@ -3039,6 +3032,7 @@ static int packet_release(struct socket *sock) synchronize_net(); if (f) { + kfree(po->rollover); fanout_release_data(f); kfree(f); } @@ -3853,7 +3847,6 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, void *data = &val; union tpacket_stats_u st; struct tpacket_rollover_stats rstats; - struct packet_rollover *rollover; if (level != SOL_PACKET) return -ENOPROTOOPT; @@ -3932,18 +3925,13 @@ static int packet_getsockopt(struct socket *sock, int level, int optname, 0); break; case PACKET_ROLLOVER_STATS: - rcu_read_lock(); - rollover = rcu_dereference(po->rollover); - if (rollover) { - rstats.tp_all = atomic_long_read(&rollover->num); - rstats.tp_huge = atomic_long_read(&rollover->num_huge); - rstats.tp_failed = atomic_long_read(&rollover->num_failed); - data = &rstats; - lv = sizeof(rstats); - } - rcu_read_unlock(); - if (!rollover) + if (!po->rollover) return -EINVAL; + rstats.tp_all = atomic_long_read(&po->rollover->num); + rstats.tp_huge = atomic_long_read(&po->rollover->num_huge); + rstats.tp_failed = atomic_long_read(&po->rollover->num_failed); + data = &rstats; + lv = sizeof(rstats); break; case PACKET_TX_HAS_OFF: val = po->tp_tx_has_off; diff --git a/net/packet/internal.h b/net/packet/internal.h index 562fbc155006..a1d2b2319ae9 100644 --- a/net/packet/internal.h +++ b/net/packet/internal.h @@ -95,7 +95,6 @@ struct packet_fanout { struct packet_rollover { int sock; - struct rcu_head rcu; atomic_long_t num; atomic_long_t num_huge; atomic_long_t num_failed; From 589983eb9986ea9c851c8906a81781f317207313 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 28 Nov 2017 08:03:30 -0800 Subject: [PATCH 0335/1013] net/packet: fix a race in packet_bind() and packet_notifier() [ Upstream commit 15fe076edea787807a7cdc168df832544b58eba6 ] syzbot reported crashes [1] and provided a C repro easing bug hunting. When/if packet_do_bind() calls __unregister_prot_hook() and releases po->bind_lock, another thread can run packet_notifier() and process an NETDEV_UP event. This calls register_prot_hook() and hooks again the socket right before first thread is able to grab again po->bind_lock. Fixes this issue by temporarily setting po->num to 0, as suggested by David Miller. [1] dev_remove_pack: ffff8801bf16fa80 not found ------------[ cut here ]------------ kernel BUG at net/core/dev.c:7945! ( BUG_ON(!list_empty(&dev->ptype_all)); ) invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: device syz0 entered promiscuous mode CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 task: ffff8801cc57a500 task.stack: ffff8801cc588000 RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945 RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293 RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2 RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810 device syz0 entered promiscuous mode RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8 R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0 FS: 0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106 tun_detach drivers/net/tun.c:670 [inline] tun_chr_close+0x49/0x60 drivers/net/tun.c:2845 __fput+0x333/0x7f0 fs/file_table.c:210 ____fput+0x15/0x20 fs/file_table.c:244 task_work_run+0x199/0x270 kernel/task_work.c:113 exit_task_work include/linux/task_work.h:22 [inline] do_exit+0x9bb/0x1ae0 kernel/exit.c:865 do_group_exit+0x149/0x400 kernel/exit.c:968 SYSC_exit_group kernel/exit.c:979 [inline] SyS_exit_group+0x1d/0x20 kernel/exit.c:977 entry_SYSCALL_64_fastpath+0x1f/0x96 RIP: 0033:0x44ad19 Fixes: 30f7ea1c2b5f ("packet: race condition in packet_bind") Signed-off-by: Eric Dumazet Reported-by: syzbot Cc: Francesco Ruggeri Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/packet/af_packet.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 2eb4f1e5b861..f4a0587b7d5e 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -3101,6 +3101,10 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex, if (need_rehook) { if (po->running) { rcu_read_unlock(); + /* prevents packet_notifier() from calling + * register_prot_hook() + */ + po->num = 0; __unregister_prot_hook(sk, true); rcu_read_lock(); dev_curr = po->prot_hook.dev; @@ -3109,6 +3113,7 @@ static int packet_do_bind(struct sock *sk, const char *name, int ifindex, dev->ifindex); } + BUG_ON(po->running); po->num = proto; po->prot_hook.type = proto; From f8e5ef4ea8f1037f3f7cfd12291f55baa82e5251 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 29 Nov 2017 17:43:57 -0800 Subject: [PATCH 0336/1013] tcp: remove buggy call to tcp_v6_restore_cb() [ Upstream commit 3016dad75b48279e579117ee3ed566ba90a3b023 ] tcp_v6_send_reset() expects to receive an skb with skb->cb[] layout as used in TCP stack. MD5 lookup uses tcp_v6_iif() and tcp_v6_sdif() and thus TCP_SKB_CB(skb)->header.h6 This patch probably fixes RST packets sent on behalf of a timewait md5 ipv6 socket. Before Florian patch, tcp_v6_restore_cb() was needed before jumping to no_tcp_socket label. Fixes: 271c3b9b7bda ("tcp: honour SO_BINDTODEVICE for TW_RST case too") Signed-off-by: Eric Dumazet Cc: Florian Westphal Acked-by: Florian Westphal Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/tcp_ipv6.c | 1 - 1 file changed, 1 deletion(-) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 067350ac843a..32ded300633d 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1585,7 +1585,6 @@ static int tcp_v6_rcv(struct sk_buff *skb) tcp_v6_timewait_ack(sk, skb); break; case TCP_TW_RST: - tcp_v6_restore_cb(skb); tcp_v6_send_reset(sk, skb); inet_twsk_deschedule_put(inet_twsk(sk)); goto discard_it; From bea712a8a5aa9c58e06e3bb894dc25633898c450 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Bj=C3=B8rn=20Mork?= Date: Wed, 6 Dec 2017 20:21:24 +0100 Subject: [PATCH 0337/1013] usbnet: fix alignment for frames with no ethernet header MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit a4abd7a80addb4a9547f7dfc7812566b60ec505c ] The qmi_wwan minidriver support a 'raw-ip' mode where frames are received without any ethernet header. This causes alignment issues because the skbs allocated by usbnet are "IP aligned". Fix by allowing minidrivers to disable the additional alignment offset. This is implemented using a per-device flag, since the same minidriver also supports 'ethernet' mode. Fixes: 32f7adf633b9 ("net: qmi_wwan: support "raw IP" mode") Reported-and-tested-by: Jay Foster Signed-off-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/qmi_wwan.c | 2 ++ drivers/net/usb/usbnet.c | 5 ++++- include/linux/usb/usbnet.h | 1 + 3 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index f4ed553929f0..81394a4b2803 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -261,9 +261,11 @@ static void qmi_wwan_netdev_setup(struct net_device *net) net->hard_header_len = 0; net->addr_len = 0; net->flags = IFF_POINTOPOINT | IFF_NOARP | IFF_MULTICAST; + set_bit(EVENT_NO_IP_ALIGN, &dev->flags); netdev_dbg(net, "mode: raw IP\n"); } else if (!net->header_ops) { /* don't bother if already set */ ether_setup(net); + clear_bit(EVENT_NO_IP_ALIGN, &dev->flags); netdev_dbg(net, "mode: Ethernet\n"); } diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c index 6510e5cc1817..42baad125a7d 100644 --- a/drivers/net/usb/usbnet.c +++ b/drivers/net/usb/usbnet.c @@ -484,7 +484,10 @@ static int rx_submit (struct usbnet *dev, struct urb *urb, gfp_t flags) return -ENOLINK; } - skb = __netdev_alloc_skb_ip_align(dev->net, size, flags); + if (test_bit(EVENT_NO_IP_ALIGN, &dev->flags)) + skb = __netdev_alloc_skb(dev->net, size, flags); + else + skb = __netdev_alloc_skb_ip_align(dev->net, size, flags); if (!skb) { netif_dbg(dev, rx_err, dev->net, "no rx skb\n"); usbnet_defer_kevent (dev, EVENT_RX_MEMORY); diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 97116379db5f..e87a805cbfef 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -81,6 +81,7 @@ struct usbnet { # define EVENT_RX_KILL 10 # define EVENT_LINK_CHANGE 11 # define EVENT_SET_RX_MODE 12 +# define EVENT_NO_IP_ALIGN 13 }; static inline struct usb_driver *driver_of(struct usb_interface *intf) From 6878a06356a36d69b18916d2f0c483f21e0885d5 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 5 Dec 2017 12:45:56 -0800 Subject: [PATCH 0338/1013] net: remove hlist_nulls_add_tail_rcu() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit d7efc6c11b277d9d80b99b1334a78bfe7d7edf10 ] Alexander Potapenko reported use of uninitialized memory [1] This happens when inserting a request socket into TCP ehash, in __sk_nulls_add_node_rcu(), since sk_reuseport is not initialized. Bug was added by commit d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets") Note that d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix") missed the opportunity to get rid of hlist_nulls_add_tail_rcu() : Both UDP sockets and TCP/DCCP listeners no longer use __sk_nulls_add_node_rcu() for their hash insertion. Since all other sockets have unique 4-tuple, the reuseport status has no special meaning, so we can always use hlist_nulls_add_head_rcu() for them and save few cycles/instructions. [1] ================================================================== BUG: KMSAN: use of uninitialized memory in inet_ehash_insert+0xd40/0x1050 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.0+ #3288 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace:    __dump_stack lib/dump_stack.c:16  dump_stack+0x185/0x1d0 lib/dump_stack.c:52  kmsan_report+0x13f/0x1c0 mm/kmsan/kmsan.c:1016  __msan_warning_32+0x69/0xb0 mm/kmsan/kmsan_instr.c:766  __sk_nulls_add_node_rcu ./include/net/sock.h:684  inet_ehash_insert+0xd40/0x1050 net/ipv4/inet_hashtables.c:413  reqsk_queue_hash_req net/ipv4/inet_connection_sock.c:754  inet_csk_reqsk_queue_hash_add+0x1cc/0x300 net/ipv4/inet_connection_sock.c:765  tcp_conn_request+0x31e7/0x36f0 net/ipv4/tcp_input.c:6414  tcp_v4_conn_request+0x16d/0x220 net/ipv4/tcp_ipv4.c:1314  tcp_rcv_state_process+0x42a/0x7210 net/ipv4/tcp_input.c:5917  tcp_v4_do_rcv+0xa6a/0xcd0 net/ipv4/tcp_ipv4.c:1483  tcp_v4_rcv+0x3de0/0x4ab0 net/ipv4/tcp_ipv4.c:1763  ip_local_deliver_finish+0x6bb/0xcb0 net/ipv4/ip_input.c:216  NF_HOOK ./include/linux/netfilter.h:248  ip_local_deliver+0x3fa/0x480 net/ipv4/ip_input.c:257  dst_input ./include/net/dst.h:477  ip_rcv_finish+0x6fb/0x1540 net/ipv4/ip_input.c:397  NF_HOOK ./include/linux/netfilter.h:248  ip_rcv+0x10f6/0x15c0 net/ipv4/ip_input.c:488  __netif_receive_skb_core+0x36f6/0x3f60 net/core/dev.c:4298  __netif_receive_skb net/core/dev.c:4336  netif_receive_skb_internal+0x63c/0x19c0 net/core/dev.c:4497  napi_skb_finish net/core/dev.c:4858  napi_gro_receive+0x629/0xa50 net/core/dev.c:4889  e1000_receive_skb drivers/net/ethernet/intel/e1000/e1000_main.c:4018  e1000_clean_rx_irq+0x1492/0x1d30 drivers/net/ethernet/intel/e1000/e1000_main.c:4474  e1000_clean+0x43aa/0x5970 drivers/net/ethernet/intel/e1000/e1000_main.c:3819  napi_poll net/core/dev.c:5500  net_rx_action+0x73c/0x1820 net/core/dev.c:5566  __do_softirq+0x4b4/0x8dd kernel/softirq.c:284  invoke_softirq kernel/softirq.c:364  irq_exit+0x203/0x240 kernel/softirq.c:405  exiting_irq+0xe/0x10 ./arch/x86/include/asm/apic.h:638  do_IRQ+0x15e/0x1a0 arch/x86/kernel/irq.c:263  common_interrupt+0x86/0x86 Fixes: d894ba18d4e4 ("soreuseport: fix ordering for mixed v4/v6 sockets") Fixes: d296ba60d8e2 ("soreuseport: Resolve merge conflict for v4/v6 ordering fix") Signed-off-by: Eric Dumazet Reported-by: Alexander Potapenko Acked-by: Craig Gallek Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/rculist_nulls.h | 38 ----------------------------------- include/net/sock.h | 6 +----- 2 files changed, 1 insertion(+), 43 deletions(-) diff --git a/include/linux/rculist_nulls.h b/include/linux/rculist_nulls.h index a328e8181e49..e4b257ff881b 100644 --- a/include/linux/rculist_nulls.h +++ b/include/linux/rculist_nulls.h @@ -100,44 +100,6 @@ static inline void hlist_nulls_add_head_rcu(struct hlist_nulls_node *n, first->pprev = &n->next; } -/** - * hlist_nulls_add_tail_rcu - * @n: the element to add to the hash list. - * @h: the list to add to. - * - * Description: - * Adds the specified element to the end of the specified hlist_nulls, - * while permitting racing traversals. NOTE: tail insertion requires - * list traversal. - * - * The caller must take whatever precautions are necessary - * (such as holding appropriate locks) to avoid racing - * with another list-mutation primitive, such as hlist_nulls_add_head_rcu() - * or hlist_nulls_del_rcu(), running on this same list. - * However, it is perfectly legal to run concurrently with - * the _rcu list-traversal primitives, such as - * hlist_nulls_for_each_entry_rcu(), used to prevent memory-consistency - * problems on Alpha CPUs. Regardless of the type of CPU, the - * list-traversal primitive must be guarded by rcu_read_lock(). - */ -static inline void hlist_nulls_add_tail_rcu(struct hlist_nulls_node *n, - struct hlist_nulls_head *h) -{ - struct hlist_nulls_node *i, *last = NULL; - - for (i = hlist_nulls_first_rcu(h); !is_a_nulls(i); - i = hlist_nulls_next_rcu(i)) - last = i; - - if (last) { - n->next = last->next; - n->pprev = &last->next; - rcu_assign_pointer(hlist_nulls_next_rcu(last), n); - } else { - hlist_nulls_add_head_rcu(n, h); - } -} - /** * hlist_nulls_for_each_entry_rcu - iterate over rcu list of given type * @tpos: the type * to use as a loop cursor. diff --git a/include/net/sock.h b/include/net/sock.h index a6b9a8d1a6df..006580155a87 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -683,11 +683,7 @@ static inline void sk_add_node_rcu(struct sock *sk, struct hlist_head *list) static inline void __sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) { - if (IS_ENABLED(CONFIG_IPV6) && sk->sk_reuseport && - sk->sk_family == AF_INET6) - hlist_nulls_add_tail_rcu(&sk->sk_nulls_node, list); - else - hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list); + hlist_nulls_add_head_rcu(&sk->sk_nulls_node, list); } static inline void sk_nulls_add_node_rcu(struct sock *sk, struct hlist_nulls_head *list) From 8b01623c73946edf7946945400305dbc5f1c85a6 Mon Sep 17 00:00:00 2001 From: Lars Persson Date: Fri, 1 Dec 2017 11:12:44 +0100 Subject: [PATCH 0339/1013] stmmac: reset last TSO segment size after device open [ Upstream commit 45ab4b13e46325d00f4acdb365d406e941a15f81 ] The mss variable tracks the last max segment size sent to the TSO engine. We do not update the hardware as long as we receive skb:s with the same value in gso_size. During a network device down/up cycle (mapped to stmmac_release() and stmmac_open() callbacks) we issue a reset to the hardware and it forgets the setting for mss. However we did not zero out our mss variable so the next transmission of a gso packet happens with an undefined hardware setting. This triggers a hang in the TSO engine and eventuelly the netdev watchdog will bark. Fixes: f748be531d70 ("stmmac: support new GMAC4") Signed-off-by: Lars Persson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 16bd50929084..28c4d6fa096c 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -2564,6 +2564,7 @@ static int stmmac_open(struct net_device *dev) priv->dma_buf_sz = STMMAC_ALIGN(buf_sz); priv->rx_copybreak = STMMAC_RX_COPYBREAK; + priv->mss = 0; ret = alloc_dma_desc_resources(priv); if (ret < 0) { From 5fa411e8555fd3e1f7fc77290b7bd9977203fa52 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 1 Dec 2017 10:06:56 -0800 Subject: [PATCH 0340/1013] tcp/dccp: block bh before arming time_wait timer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit cfac7f836a715b91f08c851df915d401a4d52783 ] Maciej Żenczykowski reported some panics in tcp_twsk_destructor() that might be caused by the following bug. timewait timer is pinned to the cpu, because we want to transition timwewait refcount from 0 to 4 in one go, once everything has been initialized. At the time commit ed2e92394589 ("tcp/dccp: fix timewait races in timer handling") was merged, TCP was always running from BH habdler. After commit 5413d1babe8f ("net: do not block BH while processing socket backlog") we definitely can run tcp_time_wait() from process context. We need to block BH in the critical section so that the pinned timer has still its purpose. This bug is more likely to happen under stress and when very small RTO are used in datacenter flows. Fixes: 5413d1babe8f ("net: do not block BH while processing socket backlog") Signed-off-by: Eric Dumazet Reported-by: Maciej Żenczykowski Acked-by: Maciej Żenczykowski Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/dccp/minisocks.c | 6 ++++++ net/ipv4/tcp_minisocks.c | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/net/dccp/minisocks.c b/net/dccp/minisocks.c index abd07a443219..178bb9833311 100644 --- a/net/dccp/minisocks.c +++ b/net/dccp/minisocks.c @@ -57,10 +57,16 @@ void dccp_time_wait(struct sock *sk, int state, int timeo) if (state == DCCP_TIME_WAIT) timeo = DCCP_TIMEWAIT_LEN; + /* tw_timer is pinned, so we need to make sure BH are disabled + * in following section, otherwise timer handler could run before + * we complete the initialization. + */ + local_bh_disable(); inet_twsk_schedule(tw, timeo); /* Linkage updates. */ __inet_twsk_hashdance(tw, sk, &dccp_hashinfo); inet_twsk_put(tw); + local_bh_enable(); } else { /* Sorry, if we're out of memory, just CLOSE this * socket up. We've got bigger problems than diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c index 188a6f31356d..420fecbb98fe 100644 --- a/net/ipv4/tcp_minisocks.c +++ b/net/ipv4/tcp_minisocks.c @@ -312,10 +312,16 @@ void tcp_time_wait(struct sock *sk, int state, int timeo) if (state == TCP_TIME_WAIT) timeo = TCP_TIMEWAIT_LEN; + /* tw_timer is pinned, so we need to make sure BH are disabled + * in following section, otherwise timer handler could run before + * we complete the initialization. + */ + local_bh_disable(); inet_twsk_schedule(tw, timeo); /* Linkage updates. */ __inet_twsk_hashdance(tw, sk, &tcp_hashinfo); inet_twsk_put(tw); + local_bh_enable(); } else { /* Sorry, if we're out of memory, just CLOSE this * socket up. We've got bigger problems than From cfa19a2edf31f8c4d90d6e936a65ba2072d67bb9 Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Fri, 1 Dec 2017 10:14:51 +0100 Subject: [PATCH 0341/1013] s390/qeth: build max size GSO skbs on L2 devices [ Upstream commit 0cbff6d4546613330a1c5f139f5c368e4ce33ca1 ] The current GSO skb size limit was copy&pasted over from the L3 path, where it is needed due to a TSO limitation. As L2 devices don't offer TSO support (and thus all GSO skbs are segmented before they reach the driver), there's no reason to restrict the stack in how large it may build the GSO skbs. Fixes: d52aec97e5bc ("qeth: enable scatter/gather in layer 2 mode") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_l2_main.c | 2 -- drivers/s390/net/qeth_l3_main.c | 4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 760b023eae95..1c977ab10aa7 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -1027,8 +1027,6 @@ static int qeth_l2_setup_netdev(struct qeth_card *card) card->info.broadcast_capable = 1; qeth_l2_request_initial_mac(card); - card->dev->gso_max_size = (QETH_MAX_BUFFER_ELEMENTS(card) - 1) * - PAGE_SIZE; SET_NETDEV_DEV(card->dev, &card->gdev->dev); netif_napi_add(card->dev, &card->napi, qeth_poll, QETH_NAPI_WEIGHT); netif_carrier_off(card->dev); diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 1d8c317553ba..75005bbd80b9 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -2989,8 +2989,8 @@ static int qeth_l3_setup_netdev(struct qeth_card *card) NETIF_F_HW_VLAN_CTAG_RX | NETIF_F_HW_VLAN_CTAG_FILTER; netif_keep_dst(card->dev); - card->dev->gso_max_size = (QETH_MAX_BUFFER_ELEMENTS(card) - 1) * - PAGE_SIZE; + netif_set_gso_max_size(card->dev, (QETH_MAX_BUFFER_ELEMENTS(card) - 1) * + PAGE_SIZE); SET_NETDEV_DEV(card->dev, &card->gdev->dev); netif_napi_add(card->dev, &card->napi, qeth_poll, QETH_NAPI_WEIGHT); From 79651202a2228b53973e73da1f6b676579df951e Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Fri, 1 Dec 2017 10:14:49 +0100 Subject: [PATCH 0342/1013] s390/qeth: fix thinko in IPv4 multicast address tracking [ Upsteam commit bc3ab70584696cb798b9e1e0ac8e6ced5fd4c3b8 ] Commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") reworked how secondary addresses are managed for qeth devices. Instead of dropping & subsequently re-adding all addresses on every ndo_set_rx_mode() call, qeth now keeps track of the addresses that are currently registered with the HW. On a ndo_set_rx_mode(), we thus only need to do (de-)registration requests for the addresses that have actually changed. On L3 devices, the lookup for IPv4 Multicast addresses checks the wrong hashtable - and thus never finds a match. As a result, we first delete *all* such addresses, and then re-add them again. So each set_rx_mode() causes a short period where the IPv4 Multicast addresses are not registered, and the card stops forwarding inbound traffic for them. Fix this by setting the ->is_multicast flag on the lookup object, thus enabling qeth_l3_ip_from_hash() to search the correct hashtable and find a match there. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_l3_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 75005bbd80b9..11f65b879322 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -1376,6 +1376,7 @@ qeth_l3_add_mc_to_hash(struct qeth_card *card, struct in_device *in4_dev) tmp->u.a4.addr = be32_to_cpu(im4->multiaddr); memcpy(tmp->mac, buf, sizeof(tmp->mac)); + tmp->is_multicast = 1; ipm = qeth_l3_ip_from_hash(card, tmp); if (ipm) { From effea096caa1732d2eff941c1beabc9c496b087e Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Fri, 1 Dec 2017 10:14:50 +0100 Subject: [PATCH 0343/1013] s390/qeth: fix GSO throughput regression [ Upstream commit 6d69b1f1eb7a2edf8a3547f361c61f2538e054bb ] Using GSO with small MTUs currently results in a substantial throughput regression - which is caused by how qeth needs to map non-linear skbs into its IO buffer elements: compared to a linear skb, each GSO-segmented skb effectively consumes twice as many buffer elements (ie two instead of one) due to the additional header-only part. This causes the Output Queue to be congested with low-utilized IO buffers. Fix this as follows: If the MSS is low enough so that a non-SG GSO segmentation produces order-0 skbs (currently ~3500 byte), opt out from NETIF_F_SG. This is where we anticipate the biggest savings, since an SG-enabled GSO segmentation produces skbs that always consume at least two buffer elements. Larger MSS values continue to get a SG-enabled GSO segmentation, since 1) the relative overhead of the additional header-only buffer element becomes less noticeable, and 2) the linearization overhead increases. With the throughput regression fixed, re-enable NETIF_F_SG by default to reap the significant CPU savings of GSO. Fixes: 5722963a8e83 ("qeth: do not turn on SG per default") Reported-by: Nils Hoppmann Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_core.h | 3 +++ drivers/s390/net/qeth_core_main.c | 31 +++++++++++++++++++++++++++++++ drivers/s390/net/qeth_l2_main.c | 2 ++ drivers/s390/net/qeth_l3_main.c | 2 ++ 4 files changed, 38 insertions(+) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 47a13c5723c6..5340efc673a9 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -985,6 +985,9 @@ struct qeth_cmd_buffer *qeth_get_setassparms_cmd(struct qeth_card *, int qeth_set_features(struct net_device *, netdev_features_t); int qeth_recover_features(struct net_device *); netdev_features_t qeth_fix_features(struct net_device *, netdev_features_t); +netdev_features_t qeth_features_check(struct sk_buff *skb, + struct net_device *dev, + netdev_features_t features); int qeth_vm_request_mac(struct qeth_card *card); int qeth_push_hdr(struct sk_buff *skb, struct qeth_hdr **hdr, unsigned int len); diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index bae7440abc01..330e5d3dadf3 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -19,6 +19,11 @@ #include #include #include +#include +#include +#include +#include + #include #include @@ -6505,6 +6510,32 @@ netdev_features_t qeth_fix_features(struct net_device *dev, } EXPORT_SYMBOL_GPL(qeth_fix_features); +netdev_features_t qeth_features_check(struct sk_buff *skb, + struct net_device *dev, + netdev_features_t features) +{ + /* GSO segmentation builds skbs with + * a (small) linear part for the headers, and + * page frags for the data. + * Compared to a linear skb, the header-only part consumes an + * additional buffer element. This reduces buffer utilization, and + * hurts throughput. So compress small segments into one element. + */ + if (netif_needs_gso(skb, features)) { + /* match skb_segment(): */ + unsigned int doffset = skb->data - skb_mac_header(skb); + unsigned int hsize = skb_shinfo(skb)->gso_size; + unsigned int hroom = skb_headroom(skb); + + /* linearize only if resulting skb allocations are order-0: */ + if (SKB_DATA_ALIGN(hroom + doffset + hsize) <= SKB_MAX_HEAD(0)) + features &= ~NETIF_F_SG; + } + + return vlan_features_check(skb, features); +} +EXPORT_SYMBOL_GPL(qeth_features_check); + static int __init qeth_core_init(void) { int rc; diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c index 1c977ab10aa7..5a973ebcb13c 100644 --- a/drivers/s390/net/qeth_l2_main.c +++ b/drivers/s390/net/qeth_l2_main.c @@ -963,6 +963,7 @@ static const struct net_device_ops qeth_l2_netdev_ops = { .ndo_stop = qeth_l2_stop, .ndo_get_stats = qeth_get_stats, .ndo_start_xmit = qeth_l2_hard_start_xmit, + .ndo_features_check = qeth_features_check, .ndo_validate_addr = eth_validate_addr, .ndo_set_rx_mode = qeth_l2_set_rx_mode, .ndo_do_ioctl = qeth_do_ioctl, @@ -1009,6 +1010,7 @@ static int qeth_l2_setup_netdev(struct qeth_card *card) if (card->info.type == QETH_CARD_TYPE_OSD && !card->info.guestlan) { card->dev->hw_features = NETIF_F_SG; card->dev->vlan_features = NETIF_F_SG; + card->dev->features |= NETIF_F_SG; /* OSA 3S and earlier has no RX/TX support */ if (qeth_is_supported(card, IPA_OUTBOUND_CHECKSUM)) { card->dev->hw_features |= NETIF_F_IP_CSUM; diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 11f65b879322..27185ab38136 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -2923,6 +2923,7 @@ static const struct net_device_ops qeth_l3_osa_netdev_ops = { .ndo_stop = qeth_l3_stop, .ndo_get_stats = qeth_get_stats, .ndo_start_xmit = qeth_l3_hard_start_xmit, + .ndo_features_check = qeth_features_check, .ndo_validate_addr = eth_validate_addr, .ndo_set_rx_mode = qeth_l3_set_multicast_list, .ndo_do_ioctl = qeth_do_ioctl, @@ -2963,6 +2964,7 @@ static int qeth_l3_setup_netdev(struct qeth_card *card) card->dev->vlan_features = NETIF_F_SG | NETIF_F_RXCSUM | NETIF_F_IP_CSUM | NETIF_F_TSO; + card->dev->features |= NETIF_F_SG; } } } else if (card->info.type == QETH_CARD_TYPE_IQD) { From efb7dc5a9658de725de8d0b2016dbe198406d93e Mon Sep 17 00:00:00 2001 From: David Ahern Date: Sun, 3 Dec 2017 09:33:00 -0800 Subject: [PATCH 0344/1013] tcp: use IPCB instead of TCP_SKB_CB in inet_exact_dif_match() [ Usptream commit b4d1605a8ea608fd7dc45b926a05d75d340bde4b ] After this fix : ("tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb()"), socket lookups happen while skb->cb[] has not been mangled yet by TCP. Fixes: a04a480d4392 ("net: Require exact match for TCP socket lookups if dif is l3mdev") Signed-off-by: David Ahern Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index e6d0002a1b0b..5dfc92ff9ffd 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -874,12 +874,11 @@ static inline int tcp_v6_sdif(const struct sk_buff *skb) } #endif -/* TCP_SKB_CB reference means this can not be used from early demux */ static inline bool inet_exact_dif_match(struct net *net, struct sk_buff *skb) { #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV) if (!net->ipv4.sysctl_tcp_l3mdev_accept && - skb && ipv4_l3mdev_skb(TCP_SKB_CB(skb)->header.h4.flags)) + skb && ipv4_l3mdev_skb(IPCB(skb)->flags)) return true; #endif return false; From de514e0609ea1fcf1f70faf427ec9b693d007f8b Mon Sep 17 00:00:00 2001 From: Tommi Rantala Date: Wed, 29 Nov 2017 12:48:42 +0200 Subject: [PATCH 0345/1013] tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv() [ Upstream commit c7799c067c2ae33e348508c8afec354f3257ff25 ] Remove the second tipc_rcv() call in tipc_udp_recv(). We have just checked that the bearer is not up, and calling tipc_rcv() with a bearer that is not up leads to a TIPC div-by-zero crash in tipc_node_calculate_timer(). The crash is rare in practice, but can happen like this: We're enabling a bearer, but it's not yet up and fully initialized. At the same time we receive a discovery packet, and in tipc_udp_recv() we end up calling tipc_rcv() with the not-yet-initialized bearer, causing later the div-by-zero crash in tipc_node_calculate_timer(). Jon Maloy explains the impact of removing the second tipc_rcv() call: "link setup in the worst case will be delayed until the next arriving discovery messages, 1 sec later, and this is an acceptable delay." As the tipc_rcv() call is removed, just leave the function via the rcu_out label, so that we will kfree_skb(). [ 12.590450] Own node address <1.1.1>, network identity 1 [ 12.668088] divide error: 0000 [#1] SMP [ 12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1 [ 12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000 [ 12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] [ 12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246 [ 12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000 [ 12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600 [ 12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001 [ 12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8 [ 12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800 [ 12.702338] FS: 0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000 [ 12.705099] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0 [ 12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 12.712627] Call Trace: [ 12.713390] [ 12.714011] tipc_node_check_dest+0x2e8/0x350 [tipc] [ 12.715286] tipc_disc_rcv+0x14d/0x1d0 [tipc] [ 12.716370] tipc_rcv+0x8b0/0xd40 [tipc] [ 12.717396] ? minmax_running_min+0x2f/0x60 [ 12.718248] ? dst_alloc+0x4c/0xa0 [ 12.718964] ? tcp_ack+0xaf1/0x10b0 [ 12.719658] ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc] [ 12.720634] tipc_udp_recv+0x71/0x1d0 [tipc] [ 12.721459] ? dst_alloc+0x4c/0xa0 [ 12.722130] udp_queue_rcv_skb+0x264/0x490 [ 12.722924] __udp4_lib_rcv+0x21e/0x990 [ 12.723670] ? ip_route_input_rcu+0x2dd/0xbf0 [ 12.724442] ? tcp_v4_rcv+0x958/0xa40 [ 12.725039] udp_rcv+0x1a/0x20 [ 12.725587] ip_local_deliver_finish+0x97/0x1d0 [ 12.726323] ip_local_deliver+0xaf/0xc0 [ 12.726959] ? ip_route_input_noref+0x19/0x20 [ 12.727689] ip_rcv_finish+0xdd/0x3b0 [ 12.728307] ip_rcv+0x2ac/0x360 [ 12.728839] __netif_receive_skb_core+0x6fb/0xa90 [ 12.729580] ? udp4_gro_receive+0x1a7/0x2c0 [ 12.730274] __netif_receive_skb+0x1d/0x60 [ 12.730953] ? __netif_receive_skb+0x1d/0x60 [ 12.731637] netif_receive_skb_internal+0x37/0xd0 [ 12.732371] napi_gro_receive+0xc7/0xf0 [ 12.732920] receive_buf+0x3c3/0xd40 [ 12.733441] virtnet_poll+0xb1/0x250 [ 12.733944] net_rx_action+0x23e/0x370 [ 12.734476] __do_softirq+0xc5/0x2f8 [ 12.734922] irq_exit+0xfa/0x100 [ 12.735315] do_IRQ+0x4f/0xd0 [ 12.735680] common_interrupt+0xa2/0xa2 [ 12.736126] [ 12.736416] RIP: 0010:native_safe_halt+0x6/0x10 [ 12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d [ 12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000 [ 12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88 [ 12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002 [ 12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000 [ 12.741831] default_idle+0x2a/0x100 [ 12.742323] arch_cpu_idle+0xf/0x20 [ 12.742796] default_idle_call+0x28/0x40 [ 12.743312] do_idle+0x179/0x1f0 [ 12.743761] cpu_startup_entry+0x1d/0x20 [ 12.744291] start_secondary+0x112/0x120 [ 12.744816] secondary_startup_64+0xa5/0xa5 [ 12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00 00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48 89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f [ 12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0 [ 12.748555] ---[ end trace 1399ab83390650fd ]--- [ 12.749296] Kernel panic - not syncing: Fatal exception in interrupt [ 12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 12.751215] Rebooting in 60 seconds.. Fixes: c9b64d492b1f ("tipc: add replicast peer discovery") Signed-off-by: Tommi Rantala Cc: Jon Maloy Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/udp_media.c | 4 ---- 1 file changed, 4 deletions(-) diff --git a/net/tipc/udp_media.c b/net/tipc/udp_media.c index ecca64fc6a6f..3deabcab4882 100644 --- a/net/tipc/udp_media.c +++ b/net/tipc/udp_media.c @@ -371,10 +371,6 @@ static int tipc_udp_recv(struct sock *sk, struct sk_buff *skb) goto rcu_out; } - tipc_rcv(sock_net(sk), skb, b); - rcu_read_unlock(); - return 0; - rcu_out: rcu_read_unlock(); out: From 91c5e6553af065f30390a76bf0556672d6cbb6ab Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 6 Dec 2017 11:08:19 -0800 Subject: [PATCH 0346/1013] tcp: use current time in tcp_rcv_space_adjust() [ Upstream commit 8632385022f2b05a6ca0b9e0f95575865de0e2ce ] When I switched rcv_rtt_est to high resolution timestamps, I forgot that tp->tcp_mstamp needed to be refreshed in tcp_rcv_space_adjust() Using an old timestamp leads to autotuning lags. Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps") Signed-off-by: Eric Dumazet Cc: Wei Wang Cc: Neal Cardwell Cc: Yuchung Cheng Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index b6bb3cdfad09..4b10e79210aa 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -592,6 +592,7 @@ void tcp_rcv_space_adjust(struct sock *sk) int time; int copied; + tcp_mstamp_refresh(tp); time = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcvq_space.time); if (time < (tp->rcv_rtt_est.rtt_us >> 3) || tp->rcv_rtt_est.rtt_us == 0) return; From 8067098c04be5a9058128164e9779e4c2e7c6dc1 Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Mon, 27 Nov 2017 18:37:21 +0100 Subject: [PATCH 0347/1013] net: sched: cbq: create block for q->link.block [ Upstream commit d51aae68b142f48232257e96ce317db25445418d ] q->link.block is not initialized, that leads to EINVAL when one tries to add filter there. So initialize it properly. This can be reproduced by: $ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit $ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1 Reported-by: Jaroslav Aster Reported-by: Ivan Vecera Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Jiri Pirko Acked-by: Eelco Chaudron Reviewed-by: Ivan Vecera Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_cbq.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index dcef97fa8047..aeffa320429d 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -1157,9 +1157,13 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt) if ((q->link.R_tab = qdisc_get_rtab(r, tb[TCA_CBQ_RTAB])) == NULL) return -EINVAL; + err = tcf_block_get(&q->link.block, &q->link.filter_list); + if (err) + goto put_rtab; + err = qdisc_class_hash_init(&q->clhash); if (err < 0) - goto put_rtab; + goto put_block; q->link.sibling = &q->link; q->link.common.classid = sch->handle; @@ -1193,6 +1197,9 @@ static int cbq_init(struct Qdisc *sch, struct nlattr *opt) cbq_addprio(q, &q->link); return 0; +put_block: + tcf_block_put(q->link.block); + put_rtab: qdisc_put_rtab(q->link.R_tab); return err; From 616bada6fd46b77351a4033392035de7875c48b6 Mon Sep 17 00:00:00 2001 From: Wei Xu Date: Fri, 1 Dec 2017 05:10:38 -0500 Subject: [PATCH 0348/1013] tap: free skb if flags error [ Upstream commit 61d78537843e676e7f56ac6db333db0c0529b892 ] tap_recvmsg() supports accepting skb by msg_control after commit 3b4ba04acca8 ("tap: support receiving skb from msg_control"), the skb if presented should be freed within the function, otherwise it would be leaked. Signed-off-by: Wei Xu Reported-by: Matthew Rosato Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/tap.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index 6c0c84c33e1f..e4c6c78fab3b 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -829,8 +829,11 @@ static ssize_t tap_do_read(struct tap_queue *q, DEFINE_WAIT(wait); ssize_t ret = 0; - if (!iov_iter_count(to)) + if (!iov_iter_count(to)) { + if (skb) + kfree_skb(skb); return 0; + } if (skb) goto put; @@ -1154,11 +1157,14 @@ static int tap_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, int flags) { struct tap_queue *q = container_of(sock, struct tap_queue, sock); + struct sk_buff *skb = m->msg_control; int ret; - if (flags & ~(MSG_DONTWAIT|MSG_TRUNC)) + if (flags & ~(MSG_DONTWAIT|MSG_TRUNC)) { + if (skb) + kfree_skb(skb); return -EINVAL; - ret = tap_do_read(q, &m->msg_iter, flags & MSG_DONTWAIT, - m->msg_control); + } + ret = tap_do_read(q, &m->msg_iter, flags & MSG_DONTWAIT, skb); if (ret > total_len) { m->msg_flags |= MSG_TRUNC; ret = flags & MSG_TRUNC ? ret : total_len; From 241eb29c019a0b4e2a3ff5ca4b4449374aeb5f87 Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Fri, 17 Nov 2017 21:06:14 -0500 Subject: [PATCH 0349/1013] tcp: when scheduling TLP, time of RTO should account for current ACK [ Upstream commit ed66dfaf236c04d414de1d218441296e57fb2bd2 ] Fix the TLP scheduling logic so that when scheduling a TLP probe, we ensure that the estimated time at which an RTO would fire accounts for the fact that ACKs indicating forward progress should push back RTO times. After the following fix: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") we had an unintentional behavior change in the following kind of scenario: suppose the RTT variance has been very low recently. Then suppose we send out a flight of N packets and our RTT is 100ms: t=0: send a flight of N packets t=100ms: receive an ACK for N-1 packets The response before df92c8394e6e that was: -> schedule a TLP for now + RTO_interval The response after df92c8394e6e is: -> schedule a TLP for t=0 + RTO_interval Since RTO_interval = srtt + RTT_variance, this means that we have scheduled a TLP timer at a point in the future that only accounts for RTT_variance. If the RTT_variance term is small, this means that the timer fires soon. Before df92c8394e6e this would not happen, because in that code, when we receive an ACK for a prefix of flight, we did: 1) Near the top of tcp_ack(), switch from TLP timer to RTO at write_queue_head->paket_tx_time + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) tcp_rearm_rto(sk); 2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval: if (flag & FLAG_ACKED) { tcp_rearm_rto(sk); 3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO to TLP at now + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); In df92c8394e6e we removed that 3-phase dance, and instead directly set the TLP timer once: we set the TLP timer in cases like this to write_queue_head->packet_tx_time + RTO_interval. So if the RTT variance is small, then this means that this is setting the TLP timer to fire quite soon. This means if the ACK for the tail of the flight takes longer than an RTT to arrive (often due to delayed ACKs), then the TLP timer fires too quickly. Fixes: df92c8394e6e ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/tcp.h | 2 +- net/ipv4/tcp_input.c | 2 +- net/ipv4/tcp_output.c | 8 +++++--- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 5dfc92ff9ffd..6ced69940f5c 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -563,7 +563,7 @@ void tcp_push_one(struct sock *, unsigned int mss_now); void tcp_send_ack(struct sock *sk); void tcp_send_delayed_ack(struct sock *sk); void tcp_send_loss_probe(struct sock *sk); -bool tcp_schedule_loss_probe(struct sock *sk); +bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto); void tcp_skb_collapse_tstamp(struct sk_buff *skb, const struct sk_buff *next_skb); diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 4b10e79210aa..c5447b9f8517 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -3021,7 +3021,7 @@ void tcp_rearm_rto(struct sock *sk) /* Try to schedule a loss probe; if that doesn't work, then schedule an RTO. */ static void tcp_set_xmit_timer(struct sock *sk) { - if (!tcp_schedule_loss_probe(sk)) + if (!tcp_schedule_loss_probe(sk, true)) tcp_rearm_rto(sk); } diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 478909f4694d..cd3d60bb7cc8 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2337,7 +2337,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, /* Send one loss probe per tail loss episode. */ if (push_one != 2) - tcp_schedule_loss_probe(sk); + tcp_schedule_loss_probe(sk, false); is_cwnd_limited |= (tcp_packets_in_flight(tp) >= tp->snd_cwnd); tcp_cwnd_validate(sk, is_cwnd_limited); return false; @@ -2345,7 +2345,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, return !tp->packets_out && tcp_send_head(sk); } -bool tcp_schedule_loss_probe(struct sock *sk) +bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto) { struct inet_connection_sock *icsk = inet_csk(sk); struct tcp_sock *tp = tcp_sk(sk); @@ -2384,7 +2384,9 @@ bool tcp_schedule_loss_probe(struct sock *sk) } /* If the RTO formula yields an earlier time, then use that time. */ - rto_delta_us = tcp_rto_delta_us(sk); /* How far in future is RTO? */ + rto_delta_us = advancing_rto ? + jiffies_to_usecs(inet_csk(sk)->icsk_rto) : + tcp_rto_delta_us(sk); /* How far in future is RTO? */ if (rto_delta_us > 0) timeout = min_t(u32, timeout, usecs_to_jiffies(rto_delta_us)); From 0536add671963d9875dc42f734d1608bef480dc7 Mon Sep 17 00:00:00 2001 From: Wei Xu Date: Fri, 1 Dec 2017 05:10:37 -0500 Subject: [PATCH 0350/1013] tun: free skb in early errors [ Upstream commit c33ee15b3820a03cf8229ba9415084197b827f8c ] tun_recvmsg() supports accepting skb by msg_control after commit ac77cfd4258f ("tun: support receiving skb through msg_control"), the skb if presented should be freed no matter how far it can go along, otherwise it would be leaked. This patch fixes several missed cases. Signed-off-by: Wei Xu Reported-by: Matthew Rosato Acked-by: Michael S. Tsirkin Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/tun.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 42bb820a56c9..a5c47db94846 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1734,8 +1734,11 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile, tun_debug(KERN_INFO, tun, "tun_do_read\n"); - if (!iov_iter_count(to)) + if (!iov_iter_count(to)) { + if (skb) + kfree_skb(skb); return 0; + } if (!skb) { /* Read frames from ring */ @@ -1851,22 +1854,24 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, { struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = __tun_get(tfile); + struct sk_buff *skb = m->msg_control; int ret; - if (!tun) - return -EBADFD; + if (!tun) { + ret = -EBADFD; + goto out_free_skb; + } if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) { ret = -EINVAL; - goto out; + goto out_put_tun; } if (flags & MSG_ERRQUEUE) { ret = sock_recv_errqueue(sock->sk, m, total_len, SOL_PACKET, TUN_TX_TIMESTAMP); goto out; } - ret = tun_do_read(tun, tfile, &m->msg_iter, flags & MSG_DONTWAIT, - m->msg_control); + ret = tun_do_read(tun, tfile, &m->msg_iter, flags & MSG_DONTWAIT, skb); if (ret > (ssize_t)total_len) { m->msg_flags |= MSG_TRUNC; ret = flags & MSG_TRUNC ? ret : total_len; @@ -1874,6 +1879,13 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len, out: tun_put(tun); return ret; + +out_put_tun: + tun_put(tun); +out_free_skb: + if (skb) + kfree_skb(skb); + return ret; } static int tun_peek_len(struct socket *sock) From 7afe2e668ee2b1f465dcccb942da9d8f8d873f20 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Tue, 21 Nov 2017 07:08:57 -0800 Subject: [PATCH 0351/1013] net: ipv6: Fixup device for anycast routes during copy [ Upstream commit 98d11291d189cb5adf49694d0ad1b971c0212697 ] Florian reported a breakage with anycast routes due to commit 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address"). Prior to this commit anycast routes were added against the loopback device causing repetitive route entries with no insight into why they existed. e.g.: $ ip -6 ro ls table local type anycast anycast 2001:db8:1:: dev lo proto kernel metric 0 pref medium anycast 2001:db8:2:: dev lo proto kernel metric 0 pref medium anycast fe80:: dev lo proto kernel metric 0 pref medium anycast fe80:: dev lo proto kernel metric 0 pref medium The point of commit 4832c30d5458 is to add the routes using the device with the address which is causing the route to be added. e.g.,: $ ip -6 ro ls table local type anycast anycast 2001:db8:1:: dev eth1 proto kernel metric 0 pref medium anycast 2001:db8:2:: dev eth2 proto kernel metric 0 pref medium anycast fe80:: dev eth2 proto kernel metric 0 pref medium anycast fe80:: dev eth1 proto kernel metric 0 pref medium For traffic to work as it did before, the dst device needs to be switched to the loopback when the copy is created similar to local routes. Fixes: 4832c30d5458 ("net: ipv6: put host and anycast routes on device with address") Signed-off-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index a96d5b385d8f..598efa8cfe25 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -960,7 +960,7 @@ static struct net_device *ip6_rt_get_dev_rcu(struct rt6_info *rt) { struct net_device *dev = rt->dst.dev; - if (rt->rt6i_flags & RTF_LOCAL) { + if (rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST)) { /* for copies of local routes, dst->dev needs to be the * device if it is a master device, the master device if * device is enslaved, and the loopback as the default From 29d631e5941366ee5eb50b3b592ca99186115a63 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sun, 19 Nov 2017 19:31:04 +0800 Subject: [PATCH 0352/1013] tun: fix rcu_read_lock imbalance in tun_build_skb [ Upstream commit 654d573845f35017dc397840fa03610fef3d08b0 ] rcu_read_lock in tun_build_skb is used to rcu_dereference tun->xdp_prog safely, rcu_read_unlock should be done in every return path. Now I could see one place missing it, where it returns NULL in switch-case XDP_REDIRECT, another palce using rcu_read_lock wrongly, where it returns NULL in if (xdp_xmit) chunk. So fix both in this patch. Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/tun.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index a5c47db94846..d06f88312e1e 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1326,6 +1326,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, err = xdp_do_redirect(tun->dev, &xdp, xdp_prog); if (err) goto err_redirect; + rcu_read_unlock(); return NULL; case XDP_TX: xdp_xmit = true; @@ -1358,7 +1359,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun, if (xdp_xmit) { skb->dev = tun->dev; generic_xdp_tx(skb, xdp_prog); - rcu_read_lock(); + rcu_read_unlock(); return NULL; } From 60335608e2f1ad6e55acaa4f13bd0f3bcd156dbd Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Tue, 21 Nov 2017 10:22:25 -0500 Subject: [PATCH 0353/1013] net: accept UFO datagrams from tuntap and packet [ Upstream commit 0c19f846d582af919db66a5914a0189f9f92c936 ] Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b5846~1..d9d30adf5677. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 939912216fa8 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee643f1 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/ Fixes: fb652fdfe837 ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: Michal Kubecek Signed-off-by: Willem de Bruijn Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/tap.c | 2 +- drivers/net/tun.c | 2 + include/linux/netdev_features.h | 4 +- include/linux/netdevice.h | 1 + include/linux/skbuff.h | 2 + include/linux/virtio_net.h | 5 +- include/net/ipv6.h | 2 +- net/core/dev.c | 3 +- net/ipv4/af_inet.c | 12 ++++- net/ipv4/udp_offload.c | 49 +++++++++++++++++-- net/ipv6/output_core.c | 6 +-- net/ipv6/udp_offload.c | 85 +++++++++++++++++++++++++++++++-- net/openvswitch/datapath.c | 14 ++++++ net/openvswitch/flow.c | 6 ++- net/sched/act_csum.c | 6 +++ 15 files changed, 181 insertions(+), 18 deletions(-) diff --git a/drivers/net/tap.c b/drivers/net/tap.c index e4c6c78fab3b..bfd4ded0a53f 100644 --- a/drivers/net/tap.c +++ b/drivers/net/tap.c @@ -1080,7 +1080,7 @@ static long tap_ioctl(struct file *file, unsigned int cmd, case TUNSETOFFLOAD: /* let the user check for future flags */ if (arg & ~(TUN_F_CSUM | TUN_F_TSO4 | TUN_F_TSO6 | - TUN_F_TSO_ECN)) + TUN_F_TSO_ECN | TUN_F_UFO)) return -EINVAL; rtnl_lock(); diff --git a/drivers/net/tun.c b/drivers/net/tun.c index d06f88312e1e..c91b110f2169 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2157,6 +2157,8 @@ static int set_offload(struct tun_struct *tun, unsigned long arg) features |= NETIF_F_TSO6; arg &= ~(TUN_F_TSO4|TUN_F_TSO6); } + + arg &= ~TUN_F_UFO; } /* This gives the user a way to test for new features in future by diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index dc8b4896b77b..b1b0ca7ccb2b 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -54,8 +54,9 @@ enum { NETIF_F_GSO_TUNNEL_REMCSUM_BIT, /* ... TUNNEL with TSO & REMCSUM */ NETIF_F_GSO_SCTP_BIT, /* ... SCTP fragmentation */ NETIF_F_GSO_ESP_BIT, /* ... ESP with TSO */ + NETIF_F_GSO_UDP_BIT, /* ... UFO, deprecated except tuntap */ /**/NETIF_F_GSO_LAST = /* last bit, see GSO_MASK */ - NETIF_F_GSO_ESP_BIT, + NETIF_F_GSO_UDP_BIT, NETIF_F_FCOE_CRC_BIT, /* FCoE CRC32 */ NETIF_F_SCTP_CRC_BIT, /* SCTP checksum offload */ @@ -132,6 +133,7 @@ enum { #define NETIF_F_GSO_TUNNEL_REMCSUM __NETIF_F(GSO_TUNNEL_REMCSUM) #define NETIF_F_GSO_SCTP __NETIF_F(GSO_SCTP) #define NETIF_F_GSO_ESP __NETIF_F(GSO_ESP) +#define NETIF_F_GSO_UDP __NETIF_F(GSO_UDP) #define NETIF_F_HW_VLAN_STAG_FILTER __NETIF_F(HW_VLAN_STAG_FILTER) #define NETIF_F_HW_VLAN_STAG_RX __NETIF_F(HW_VLAN_STAG_RX) #define NETIF_F_HW_VLAN_STAG_TX __NETIF_F(HW_VLAN_STAG_TX) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 2eaac7d75af4..46bf7cc7d5d5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4101,6 +4101,7 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type) BUILD_BUG_ON(SKB_GSO_TUNNEL_REMCSUM != (NETIF_F_GSO_TUNNEL_REMCSUM >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_SCTP != (NETIF_F_GSO_SCTP >> NETIF_F_GSO_SHIFT)); BUILD_BUG_ON(SKB_GSO_ESP != (NETIF_F_GSO_ESP >> NETIF_F_GSO_SHIFT)); + BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT)); return (features & feature) == feature; } diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d448a4804aea..051e0939ec19 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -569,6 +569,8 @@ enum { SKB_GSO_SCTP = 1 << 14, SKB_GSO_ESP = 1 << 15, + + SKB_GSO_UDP = 1 << 16, }; #if BITS_PER_LONG > 32 diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h index 210034c896e3..f144216febc6 100644 --- a/include/linux/virtio_net.h +++ b/include/linux/virtio_net.h @@ -9,7 +9,7 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, const struct virtio_net_hdr *hdr, bool little_endian) { - unsigned short gso_type = 0; + unsigned int gso_type = 0; if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) { switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) { @@ -19,6 +19,9 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, case VIRTIO_NET_HDR_GSO_TCPV6: gso_type = SKB_GSO_TCPV6; break; + case VIRTIO_NET_HDR_GSO_UDP: + gso_type = SKB_GSO_UDP; + break; default: return -EINVAL; } diff --git a/include/net/ipv6.h b/include/net/ipv6.h index 6eac5cf8f1e6..35e9dd2d18ba 100644 --- a/include/net/ipv6.h +++ b/include/net/ipv6.h @@ -727,7 +727,7 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add __be32 ipv6_select_ident(struct net *net, const struct in6_addr *daddr, const struct in6_addr *saddr); -void ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb); +__be32 ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb); int ip6_dst_hoplimit(struct dst_entry *dst); diff --git a/net/core/dev.c b/net/core/dev.c index 11596a302a26..27357fc1730b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2735,7 +2735,8 @@ EXPORT_SYMBOL(skb_mac_gso_segment); static inline bool skb_needs_check(struct sk_buff *skb, bool tx_path) { if (tx_path) - return skb->ip_summed != CHECKSUM_PARTIAL; + return skb->ip_summed != CHECKSUM_PARTIAL && + skb->ip_summed != CHECKSUM_UNNECESSARY; return skb->ip_summed == CHECKSUM_NONE; } diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index e31108e5ef79..b9d9a2b8792c 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1221,9 +1221,10 @@ EXPORT_SYMBOL(inet_sk_rebuild_header); struct sk_buff *inet_gso_segment(struct sk_buff *skb, netdev_features_t features) { - bool fixedid = false, gso_partial, encap; + bool udpfrag = false, fixedid = false, gso_partial, encap; struct sk_buff *segs = ERR_PTR(-EINVAL); const struct net_offload *ops; + unsigned int offset = 0; struct iphdr *iph; int proto, tot_len; int nhoff; @@ -1258,6 +1259,7 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb, segs = ERR_PTR(-EPROTONOSUPPORT); if (!skb->encapsulation || encap) { + udpfrag = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP); fixedid = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TCP_FIXEDID); /* fixed ID is invalid if DF bit is not set */ @@ -1277,7 +1279,13 @@ struct sk_buff *inet_gso_segment(struct sk_buff *skb, skb = segs; do { iph = (struct iphdr *)(skb_mac_header(skb) + nhoff); - if (skb_is_gso(skb)) { + if (udpfrag) { + iph->frag_off = htons(offset >> 3); + if (skb->next) + iph->frag_off |= htons(IP_MF); + offset += skb->len - nhoff - ihl; + tot_len = skb->len - nhoff; + } else if (skb_is_gso(skb)) { if (!fixedid) { iph->id = htons(id); id += skb_shinfo(skb)->gso_segs; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index e360d55be555..01801b77bd0d 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -187,16 +187,57 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb, } EXPORT_SYMBOL(skb_udp_tunnel_segment); -static struct sk_buff *udp4_tunnel_segment(struct sk_buff *skb, - netdev_features_t features) +static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb, + netdev_features_t features) { struct sk_buff *segs = ERR_PTR(-EINVAL); + unsigned int mss; + __wsum csum; + struct udphdr *uh; + struct iphdr *iph; if (skb->encapsulation && (skb_shinfo(skb)->gso_type & - (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))) + (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM))) { segs = skb_udp_tunnel_segment(skb, features, false); + goto out; + } + + if (!pskb_may_pull(skb, sizeof(struct udphdr))) + goto out; + + mss = skb_shinfo(skb)->gso_size; + if (unlikely(skb->len <= mss)) + goto out; + + /* Do software UFO. Complete and fill in the UDP checksum as + * HW cannot do checksum of UDP packets sent as multiple + * IP fragments. + */ + uh = udp_hdr(skb); + iph = ip_hdr(skb); + + uh->check = 0; + csum = skb_checksum(skb, 0, skb->len, 0); + uh->check = udp_v4_check(skb->len, iph->saddr, iph->daddr, csum); + if (uh->check == 0) + uh->check = CSUM_MANGLED_0; + + skb->ip_summed = CHECKSUM_UNNECESSARY; + + /* If there is no outer header we can fake a checksum offload + * due to the fact that we have already done the checksum in + * software prior to segmenting the frame. + */ + if (!skb->encap_hdr_csum) + features |= NETIF_F_HW_CSUM; + + /* Fragment the skb. IP headers of the fragments are updated in + * inet_gso_segment() + */ + segs = skb_segment(skb, features); +out: return segs; } @@ -330,7 +371,7 @@ static int udp4_gro_complete(struct sk_buff *skb, int nhoff) static const struct net_offload udpv4_offload = { .callbacks = { - .gso_segment = udp4_tunnel_segment, + .gso_segment = udp4_ufo_fragment, .gro_receive = udp4_gro_receive, .gro_complete = udp4_gro_complete, }, diff --git a/net/ipv6/output_core.c b/net/ipv6/output_core.c index a338bbc33cf3..4fe7c90962dd 100644 --- a/net/ipv6/output_core.c +++ b/net/ipv6/output_core.c @@ -39,7 +39,7 @@ static u32 __ipv6_select_ident(struct net *net, u32 hashrnd, * * The network header must be set before calling this. */ -void ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb) +__be32 ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb) { static u32 ip6_proxy_idents_hashrnd __read_mostly; struct in6_addr buf[2]; @@ -51,14 +51,14 @@ void ipv6_proxy_select_ident(struct net *net, struct sk_buff *skb) offsetof(struct ipv6hdr, saddr), sizeof(buf), buf); if (!addrs) - return; + return 0; net_get_random_once(&ip6_proxy_idents_hashrnd, sizeof(ip6_proxy_idents_hashrnd)); id = __ipv6_select_ident(net, ip6_proxy_idents_hashrnd, &addrs[1], &addrs[0]); - skb_shinfo(skb)->ip6_frag_id = htonl(id); + return htonl(id); } EXPORT_SYMBOL_GPL(ipv6_proxy_select_ident); diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index 455fd4e39333..a0f89ad76f9d 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -17,15 +17,94 @@ #include #include "ip6_offload.h" -static struct sk_buff *udp6_tunnel_segment(struct sk_buff *skb, - netdev_features_t features) +static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, + netdev_features_t features) { struct sk_buff *segs = ERR_PTR(-EINVAL); + unsigned int mss; + unsigned int unfrag_ip6hlen, unfrag_len; + struct frag_hdr *fptr; + u8 *packet_start, *prevhdr; + u8 nexthdr; + u8 frag_hdr_sz = sizeof(struct frag_hdr); + __wsum csum; + int tnl_hlen; + int err; + + mss = skb_shinfo(skb)->gso_size; + if (unlikely(skb->len <= mss)) + goto out; if (skb->encapsulation && skb_shinfo(skb)->gso_type & (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM)) segs = skb_udp_tunnel_segment(skb, features, true); + else { + const struct ipv6hdr *ipv6h; + struct udphdr *uh; + + if (!pskb_may_pull(skb, sizeof(struct udphdr))) + goto out; + + /* Do software UFO. Complete and fill in the UDP checksum as HW cannot + * do checksum of UDP packets sent as multiple IP fragments. + */ + + uh = udp_hdr(skb); + ipv6h = ipv6_hdr(skb); + + uh->check = 0; + csum = skb_checksum(skb, 0, skb->len, 0); + uh->check = udp_v6_check(skb->len, &ipv6h->saddr, + &ipv6h->daddr, csum); + if (uh->check == 0) + uh->check = CSUM_MANGLED_0; + + skb->ip_summed = CHECKSUM_UNNECESSARY; + + /* If there is no outer header we can fake a checksum offload + * due to the fact that we have already done the checksum in + * software prior to segmenting the frame. + */ + if (!skb->encap_hdr_csum) + features |= NETIF_F_HW_CSUM; + + /* Check if there is enough headroom to insert fragment header. */ + tnl_hlen = skb_tnl_header_len(skb); + if (skb->mac_header < (tnl_hlen + frag_hdr_sz)) { + if (gso_pskb_expand_head(skb, tnl_hlen + frag_hdr_sz)) + goto out; + } + + /* Find the unfragmentable header and shift it left by frag_hdr_sz + * bytes to insert fragment header. + */ + err = ip6_find_1stfragopt(skb, &prevhdr); + if (err < 0) + return ERR_PTR(err); + unfrag_ip6hlen = err; + nexthdr = *prevhdr; + *prevhdr = NEXTHDR_FRAGMENT; + unfrag_len = (skb_network_header(skb) - skb_mac_header(skb)) + + unfrag_ip6hlen + tnl_hlen; + packet_start = (u8 *) skb->head + SKB_GSO_CB(skb)->mac_offset; + memmove(packet_start-frag_hdr_sz, packet_start, unfrag_len); + + SKB_GSO_CB(skb)->mac_offset -= frag_hdr_sz; + skb->mac_header -= frag_hdr_sz; + skb->network_header -= frag_hdr_sz; + + fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen); + fptr->nexthdr = nexthdr; + fptr->reserved = 0; + fptr->identification = ipv6_proxy_select_ident(dev_net(skb->dev), skb); + + /* Fragment the skb. ipv6 header and the remaining fields of the + * fragment header are updated in ipv6_gso_segment() + */ + segs = skb_segment(skb, features); + } +out: return segs; } @@ -75,7 +154,7 @@ static int udp6_gro_complete(struct sk_buff *skb, int nhoff) static const struct net_offload udpv6_offload = { .callbacks = { - .gso_segment = udp6_tunnel_segment, + .gso_segment = udp6_ufo_fragment, .gro_receive = udp6_gro_receive, .gro_complete = udp6_gro_complete, }, diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index c3aec6227c91..294444bb075c 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -335,6 +335,8 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, const struct dp_upcall_info *upcall_info, uint32_t cutlen) { + unsigned short gso_type = skb_shinfo(skb)->gso_type; + struct sw_flow_key later_key; struct sk_buff *segs, *nskb; int err; @@ -345,9 +347,21 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, if (segs == NULL) return -EINVAL; + if (gso_type & SKB_GSO_UDP) { + /* The initial flow key extracted by ovs_flow_key_extract() + * in this case is for a first fragment, so we need to + * properly mark later fragments. + */ + later_key = *key; + later_key.ip.frag = OVS_FRAG_TYPE_LATER; + } + /* Queue all of the segments. */ skb = segs; do { + if (gso_type & SKB_GSO_UDP && skb != segs) + key = &later_key; + err = queue_userspace_packet(dp, skb, key, upcall_info, cutlen); if (err) break; diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index 8c94cef25a72..cfb652a4e007 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -584,7 +584,8 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) key->ip.frag = OVS_FRAG_TYPE_LATER; return 0; } - if (nh->frag_off & htons(IP_MF)) + if (nh->frag_off & htons(IP_MF) || + skb_shinfo(skb)->gso_type & SKB_GSO_UDP) key->ip.frag = OVS_FRAG_TYPE_FIRST; else key->ip.frag = OVS_FRAG_TYPE_NONE; @@ -700,6 +701,9 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) if (key->ip.frag == OVS_FRAG_TYPE_LATER) return 0; + if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP) + key->ip.frag = OVS_FRAG_TYPE_FIRST; + /* Transport layer. */ if (key->ip.proto == NEXTHDR_TCP) { if (tcphdr_ok(skb)) { diff --git a/net/sched/act_csum.c b/net/sched/act_csum.c index 1c40caadcff9..d836f998117b 100644 --- a/net/sched/act_csum.c +++ b/net/sched/act_csum.c @@ -229,6 +229,9 @@ static int tcf_csum_ipv4_udp(struct sk_buff *skb, unsigned int ihl, const struct iphdr *iph; u16 ul; + if (skb_is_gso(skb) && skb_shinfo(skb)->gso_type & SKB_GSO_UDP) + return 1; + /* * Support both UDP and UDPLITE checksum algorithms, Don't use * udph->len to get the real length without any protocol check, @@ -282,6 +285,9 @@ static int tcf_csum_ipv6_udp(struct sk_buff *skb, unsigned int ihl, const struct ipv6hdr *ip6h; u16 ul; + if (skb_is_gso(skb) && skb_shinfo(skb)->gso_type & SKB_GSO_UDP) + return 1; + /* * Support both UDP and UDPLITE checksum algorithms, Don't use * udph->len to get the real length without any protocol check, From 87ff3fb30de11a64d25a4a22f23def1ce77cf840 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Sat, 25 Nov 2017 13:14:40 -0600 Subject: [PATCH 0354/1013] net: openvswitch: datapath: fix data type in queue_gso_packets [ Upstream commit 2734166e89639c973c6e125ac8bcfc2d9db72b70 ] gso_type is being used in binary AND operations together with SKB_GSO_UDP. The issue is that variable gso_type is of type unsigned short and SKB_GSO_UDP expands to more than 16 bits: SKB_GSO_UDP = 1 << 16 this makes any binary AND operation between gso_type and SKB_GSO_UDP to be always zero, hence making some code unreachable and likely causing undesired behavior. Fix this by changing the data type of variable gso_type to unsigned int. Addresses-Coverity-ID: 1462223 Fixes: 0c19f846d582 ("net: accept UFO datagrams from tuntap and packet") Signed-off-by: Gustavo A. R. Silva Acked-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/openvswitch/datapath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 294444bb075c..363dd904733d 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -335,7 +335,7 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, const struct dp_upcall_info *upcall_info, uint32_t cutlen) { - unsigned short gso_type = skb_shinfo(skb)->gso_type; + unsigned int gso_type = skb_shinfo(skb)->gso_type; struct sw_flow_key later_key; struct sk_buff *segs, *nskb; int err; From 627a5956119fd58bd4b94a26437d449830fc4736 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 27 Nov 2017 11:11:41 -0800 Subject: [PATCH 0355/1013] cls_bpf: don't decrement net's refcount when offload fails [ Upstream commit 25415cec502a1232b19fffc85465882b19a90415 ] When cls_bpf offload was added it seemed like a good idea to call cls_bpf_delete_prog() instead of extending the error handling path, since the software state is fully initialized at that point. This handling of errors without jumping to the end of the function is error prone, as proven by later commit missing that extra call to __cls_bpf_delete_prog(). __cls_bpf_delete_prog() is now expected to be invoked with a reference on exts->net or the field zeroed out. The call on the offload's error patch does not fullfil this requirement, leading to each error stealing a reference on net namespace. Create a function undoing what cls_bpf_set_parms() did and use it from __cls_bpf_delete_prog() and the error path. Fixes: aae2c35ec892 ("cls_bpf: use tcf_exts_get_net() before call_rcu()") Signed-off-by: Jakub Kicinski Reviewed-by: Simon Horman Acked-by: Daniel Borkmann Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/cls_bpf.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c index 990eb4d91d54..3a499530f321 100644 --- a/net/sched/cls_bpf.c +++ b/net/sched/cls_bpf.c @@ -246,11 +246,8 @@ static int cls_bpf_init(struct tcf_proto *tp) return 0; } -static void __cls_bpf_delete_prog(struct cls_bpf_prog *prog) +static void cls_bpf_free_parms(struct cls_bpf_prog *prog) { - tcf_exts_destroy(&prog->exts); - tcf_exts_put_net(&prog->exts); - if (cls_bpf_is_ebpf(prog)) bpf_prog_put(prog->filter); else @@ -258,6 +255,14 @@ static void __cls_bpf_delete_prog(struct cls_bpf_prog *prog) kfree(prog->bpf_name); kfree(prog->bpf_ops); +} + +static void __cls_bpf_delete_prog(struct cls_bpf_prog *prog) +{ + tcf_exts_destroy(&prog->exts); + tcf_exts_put_net(&prog->exts); + + cls_bpf_free_parms(prog); kfree(prog); } @@ -509,10 +514,8 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, goto errout; ret = cls_bpf_offload(tp, prog, oldprog); - if (ret) { - __cls_bpf_delete_prog(prog); - return ret; - } + if (ret) + goto errout_parms; if (!tc_in_hw(prog->gen_flags)) prog->gen_flags |= TCA_CLS_FLAGS_NOT_IN_HW; @@ -529,6 +532,8 @@ static int cls_bpf_change(struct net *net, struct sk_buff *in_skb, *arg = prog; return 0; +errout_parms: + cls_bpf_free_parms(prog); errout: tcf_exts_destroy(&prog->exts); kfree(prog); From 0c882334e2b778c95858d38f2180819d887fa5c3 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sun, 26 Nov 2017 20:56:07 +0800 Subject: [PATCH 0356/1013] sctp: use right member as the param of list_for_each_entry [ Upstream commit a8dd397903a6e57157f6265911f7d35681364427 ] Commit d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock") made a mistake that using 'list' as the param of list_for_each_entry to traverse the retransmit, sacked and abandoned queues, while chunks are using 'transmitted_list' to link into these queues. It could cause NULL dereference panic if there are chunks in any of these queues when peeling off one asoc. So use the chunk member 'transmitted_list' instead in this patch. Fixes: d04adf1b3551 ("sctp: reset owner sk for data chunks on out queues when migrating a sock") Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sctp/socket.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 14c28fbfe6b8..d6163f7aefb1 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -187,13 +187,13 @@ static void sctp_for_each_tx_datachunk(struct sctp_association *asoc, list_for_each_entry(chunk, &t->transmitted, transmitted_list) cb(chunk); - list_for_each_entry(chunk, &q->retransmit, list) + list_for_each_entry(chunk, &q->retransmit, transmitted_list) cb(chunk); - list_for_each_entry(chunk, &q->sacked, list) + list_for_each_entry(chunk, &q->sacked, transmitted_list) cb(chunk); - list_for_each_entry(chunk, &q->abandoned, list) + list_for_each_entry(chunk, &q->abandoned, transmitted_list) cb(chunk); list_for_each_entry(chunk, &q->out_chunk_list, list) From 7a33cd0ef9dda4692cb15fb37eb2e5ab0ef19609 Mon Sep 17 00:00:00 2001 From: Masamitsu Yamazaki Date: Wed, 15 Nov 2017 07:33:14 +0000 Subject: [PATCH 0357/1013] ipmi: Stop timers before cleaning up the module commit 4f7f5551a760eb0124267be65763008169db7087 upstream. System may crash after unloading ipmi_si.ko module because a timer may remain and fire after the module cleaned up resources. cleanup_one_si() contains the following processing. /* * Make sure that interrupts, the timer and the thread are * stopped and will not run again. */ if (to_clean->irq_cleanup) to_clean->irq_cleanup(to_clean); wait_for_timer_and_thread(to_clean); /* * Timeouts are stopped, now make sure the interrupts are off * in the BMC. Note that timers and CPU interrupts are off, * so no need for locks. */ while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) { poll(to_clean); schedule_timeout_uninterruptible(1); } si_state changes as following in the while loop calling poll(to_clean). SI_GETTING_MESSAGES => SI_CHECKING_ENABLES => SI_SETTING_ENABLES => SI_GETTING_EVENTS => SI_NORMAL As written in the code comments above, timers are expected to stop before the polling loop and not to run again. But the timer is set again in the following process when si_state becomes SI_SETTING_ENABLES. => poll => smi_event_handler => handle_transaction_done // smi_info->si_state == SI_SETTING_ENABLES => start_getting_events => start_new_msg => smi_mod_timer => mod_timer As a result, before the timer set in start_new_msg() expires, the polling loop may see si_state becoming SI_NORMAL and the module clean-up finishes. For example, hard LOCKUP and panic occurred as following. smi_timeout was called after smi_event_handler, kcs_event and hangs at port_inb() trying to access I/O port after release. [exception RIP: port_inb+19] RIP: ffffffffc0473053 RSP: ffff88069fdc3d80 RFLAGS: 00000006 RAX: ffff8806800f8e00 RBX: ffff880682bd9400 RCX: 0000000000000000 RDX: 0000000000000ca3 RSI: 0000000000000ca3 RDI: ffff8806800f8e40 RBP: ffff88069fdc3d80 R8: ffffffff81d86dfc R9: ffffffff81e36426 R10: 00000000000509f0 R11: 0000000000100000 R12: 0000000000]:000000 R13: 0000000000000000 R14: 0000000000000246 R15: ffff8806800f8e00 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 --- --- To fix the problem I defined a flag, timer_can_start, as member of struct smi_info. The flag is enabled immediately after initializing the timer and disabled immediately before waiting for timer deletion. Fixes: 0cfec916e86d ("ipmi: Start the timer and thread on internal msgs") Signed-off-by: Yamazaki Masamitsu [Some fairly major changes went into the IPMI driver in 4.15, so this required a backport as the code had changed and moved to a different file.] Signed-off-by: Corey Minyard Signed-off-by: Greg Kroah-Hartman --- drivers/char/ipmi/ipmi_si_intf.c | 44 +++++++++++++++++--------------- 1 file changed, 23 insertions(+), 21 deletions(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index bc3984ffe867..c04aa11f0e21 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -242,6 +242,9 @@ struct smi_info { /* The timer for this si. */ struct timer_list si_timer; + /* This flag is set, if the timer can be set */ + bool timer_can_start; + /* This flag is set, if the timer is running (timer_pending() isn't enough) */ bool timer_running; @@ -417,6 +420,8 @@ static enum si_sm_result start_next_msg(struct smi_info *smi_info) static void smi_mod_timer(struct smi_info *smi_info, unsigned long new_val) { + if (!smi_info->timer_can_start) + return; smi_info->last_timeout_jiffies = jiffies; mod_timer(&smi_info->si_timer, new_val); smi_info->timer_running = true; @@ -436,21 +441,18 @@ static void start_new_msg(struct smi_info *smi_info, unsigned char *msg, smi_info->handlers->start_transaction(smi_info->si_sm, msg, size); } -static void start_check_enables(struct smi_info *smi_info, bool start_timer) +static void start_check_enables(struct smi_info *smi_info) { unsigned char msg[2]; msg[0] = (IPMI_NETFN_APP_REQUEST << 2); msg[1] = IPMI_GET_BMC_GLOBAL_ENABLES_CMD; - if (start_timer) - start_new_msg(smi_info, msg, 2); - else - smi_info->handlers->start_transaction(smi_info->si_sm, msg, 2); + start_new_msg(smi_info, msg, 2); smi_info->si_state = SI_CHECKING_ENABLES; } -static void start_clear_flags(struct smi_info *smi_info, bool start_timer) +static void start_clear_flags(struct smi_info *smi_info) { unsigned char msg[3]; @@ -459,10 +461,7 @@ static void start_clear_flags(struct smi_info *smi_info, bool start_timer) msg[1] = IPMI_CLEAR_MSG_FLAGS_CMD; msg[2] = WDT_PRE_TIMEOUT_INT; - if (start_timer) - start_new_msg(smi_info, msg, 3); - else - smi_info->handlers->start_transaction(smi_info->si_sm, msg, 3); + start_new_msg(smi_info, msg, 3); smi_info->si_state = SI_CLEARING_FLAGS; } @@ -497,11 +496,11 @@ static void start_getting_events(struct smi_info *smi_info) * Note that we cannot just use disable_irq(), since the interrupt may * be shared. */ -static inline bool disable_si_irq(struct smi_info *smi_info, bool start_timer) +static inline bool disable_si_irq(struct smi_info *smi_info) { if ((smi_info->irq) && (!smi_info->interrupt_disabled)) { smi_info->interrupt_disabled = true; - start_check_enables(smi_info, start_timer); + start_check_enables(smi_info); return true; } return false; @@ -511,7 +510,7 @@ static inline bool enable_si_irq(struct smi_info *smi_info) { if ((smi_info->irq) && (smi_info->interrupt_disabled)) { smi_info->interrupt_disabled = false; - start_check_enables(smi_info, true); + start_check_enables(smi_info); return true; } return false; @@ -529,7 +528,7 @@ static struct ipmi_smi_msg *alloc_msg_handle_irq(struct smi_info *smi_info) msg = ipmi_alloc_smi_msg(); if (!msg) { - if (!disable_si_irq(smi_info, true)) + if (!disable_si_irq(smi_info)) smi_info->si_state = SI_NORMAL; } else if (enable_si_irq(smi_info)) { ipmi_free_smi_msg(msg); @@ -545,7 +544,7 @@ static void handle_flags(struct smi_info *smi_info) /* Watchdog pre-timeout */ smi_inc_stat(smi_info, watchdog_pretimeouts); - start_clear_flags(smi_info, true); + start_clear_flags(smi_info); smi_info->msg_flags &= ~WDT_PRE_TIMEOUT_INT; if (smi_info->intf) ipmi_smi_watchdog_pretimeout(smi_info->intf); @@ -928,7 +927,7 @@ static enum si_sm_result smi_event_handler(struct smi_info *smi_info, * disable and messages disabled. */ if (smi_info->supports_event_msg_buff || smi_info->irq) { - start_check_enables(smi_info, true); + start_check_enables(smi_info); } else { smi_info->curr_msg = alloc_msg_handle_irq(smi_info); if (!smi_info->curr_msg) @@ -1235,6 +1234,7 @@ static int smi_start_processing(void *send_info, /* Set up the timer that drives the interface. */ setup_timer(&new_smi->si_timer, smi_timeout, (long)new_smi); + new_smi->timer_can_start = true; smi_mod_timer(new_smi, jiffies + SI_TIMEOUT_JIFFIES); /* Try to claim any interrupts. */ @@ -3416,10 +3416,12 @@ static void check_for_broken_irqs(struct smi_info *smi_info) check_set_rcv_irq(smi_info); } -static inline void wait_for_timer_and_thread(struct smi_info *smi_info) +static inline void stop_timer_and_thread(struct smi_info *smi_info) { if (smi_info->thread != NULL) kthread_stop(smi_info->thread); + + smi_info->timer_can_start = false; if (smi_info->timer_running) del_timer_sync(&smi_info->si_timer); } @@ -3605,7 +3607,7 @@ static int try_smi_init(struct smi_info *new_smi) * Start clearing the flags before we enable interrupts or the * timer to avoid racing with the timer. */ - start_clear_flags(new_smi, false); + start_clear_flags(new_smi); /* * IRQ is defined to be set when non-zero. req_events will @@ -3674,7 +3676,7 @@ static int try_smi_init(struct smi_info *new_smi) return 0; out_err_stop_timer: - wait_for_timer_and_thread(new_smi); + stop_timer_and_thread(new_smi); out_err: new_smi->interrupt_disabled = true; @@ -3866,7 +3868,7 @@ static void cleanup_one_si(struct smi_info *to_clean) */ if (to_clean->irq_cleanup) to_clean->irq_cleanup(to_clean); - wait_for_timer_and_thread(to_clean); + stop_timer_and_thread(to_clean); /* * Timeouts are stopped, now make sure the interrupts are off @@ -3878,7 +3880,7 @@ static void cleanup_one_si(struct smi_info *to_clean) schedule_timeout_uninterruptible(1); } if (to_clean->handlers) - disable_si_irq(to_clean, false); + disable_si_irq(to_clean); while (to_clean->curr_msg || (to_clean->si_state != SI_NORMAL)) { poll(to_clean); schedule_timeout_uninterruptible(1); From 4b57542126669a5ad8bede61c8e9e07a5fc4a39b Mon Sep 17 00:00:00 2001 From: Vincent Pelletier Date: Sun, 26 Nov 2017 06:52:53 +0000 Subject: [PATCH 0358/1013] usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping commit 30bf90ccdec1da9c8198b161ecbff39ce4e5a9ba upstream. Found using DEBUG_ATOMIC_SLEEP while submitting an AIO read operation: [ 100.853642] BUG: sleeping function called from invalid context at mm/slab.h:421 [ 100.861148] in_atomic(): 1, irqs_disabled(): 1, pid: 1880, name: python [ 100.867954] 2 locks held by python/1880: [ 100.867961] #0: (&epfile->mutex){....}, at: [] ffs_mutex_lock+0x27/0x30 [usb_f_fs] [ 100.868020] #1: (&(&ffs->eps_lock)->rlock){....}, at: [] ffs_epfile_io.isra.17+0x24b/0x590 [usb_f_fs] [ 100.868076] CPU: 1 PID: 1880 Comm: python Not tainted 4.14.0-edison+ #118 [ 100.868085] Hardware name: Intel Corporation Merrifield/BODEGA BAY, BIOS 542 2015.01.21:18.19.48 [ 100.868093] Call Trace: [ 100.868122] dump_stack+0x47/0x62 [ 100.868156] ___might_sleep+0xfd/0x110 [ 100.868182] __might_sleep+0x68/0x70 [ 100.868217] kmem_cache_alloc_trace+0x4b/0x200 [ 100.868248] ? dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3] [ 100.868302] dwc3_gadget_ep_alloc_request+0x24/0xe0 [dwc3] [ 100.868343] usb_ep_alloc_request+0x16/0xc0 [udc_core] [ 100.868386] ffs_epfile_io.isra.17+0x444/0x590 [usb_f_fs] [ 100.868424] ? _raw_spin_unlock_irqrestore+0x27/0x40 [ 100.868457] ? kiocb_set_cancel_fn+0x57/0x60 [ 100.868477] ? ffs_ep0_poll+0xc0/0xc0 [usb_f_fs] [ 100.868512] ffs_epfile_read_iter+0xfe/0x157 [usb_f_fs] [ 100.868551] ? security_file_permission+0x9c/0xd0 [ 100.868587] ? rw_verify_area+0xac/0x120 [ 100.868633] aio_read+0x9d/0x100 [ 100.868692] ? __fget+0xa2/0xd0 [ 100.868727] ? __might_sleep+0x68/0x70 [ 100.868763] SyS_io_submit+0x471/0x680 [ 100.868878] do_int80_syscall_32+0x4e/0xd0 [ 100.868921] entry_INT80_32+0x2a/0x2a [ 100.868932] EIP: 0xb7fbb676 [ 100.868941] EFLAGS: 00000292 CPU: 1 [ 100.868951] EAX: ffffffda EBX: b7aa2000 ECX: 00000002 EDX: b7af8368 [ 100.868961] ESI: b7fbb660 EDI: b7aab000 EBP: bfb6c658 ESP: bfb6c638 [ 100.868973] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b Signed-off-by: Vincent Pelletier Signed-off-by: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- drivers/usb/gadget/function/f_fs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c index 0202e5132fa7..876cdbec1307 100644 --- a/drivers/usb/gadget/function/f_fs.c +++ b/drivers/usb/gadget/function/f_fs.c @@ -1016,7 +1016,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data) else ret = ep->status; goto error_mutex; - } else if (!(req = usb_ep_alloc_request(ep->ep, GFP_KERNEL))) { + } else if (!(req = usb_ep_alloc_request(ep->ep, GFP_ATOMIC))) { ret = -ENOMEM; } else { req->buf = data; From fc2f802193ccecf710ab608712b5b102dd9efc75 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 14 Nov 2017 14:42:57 -0500 Subject: [PATCH 0359/1013] fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall commit 4d2dc2cc766c3b51929658cacbc6e34fc8e242fb upstream. Currently, we're capping the values too low in the F_GETLK64 case. The fields in that structure are 64-bit values, so we shouldn't need to do any sort of fixup there. Make sure we check that assumption at build time in the future however by ensuring that the sizes we're copying will fit. With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it. Fixes: 94073ad77fff2 (fs/locks: don't mess with the address limit in compat_fcntl64) Reported-by: Vitaly Lipatov Signed-off-by: Jeff Layton Reviewed-by: David Howells Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/compat.h | 1 - arch/mips/include/asm/compat.h | 1 - arch/parisc/include/asm/compat.h | 1 - arch/powerpc/include/asm/compat.h | 1 - arch/s390/include/asm/compat.h | 1 - arch/sparc/include/asm/compat.h | 1 - arch/tile/include/asm/compat.h | 1 - arch/x86/include/asm/compat.h | 1 - fs/fcntl.c | 11 +++++------ 9 files changed, 5 insertions(+), 14 deletions(-) diff --git a/arch/arm64/include/asm/compat.h b/arch/arm64/include/asm/compat.h index e39d487bf724..a3c7f271ad4c 100644 --- a/arch/arm64/include/asm/compat.h +++ b/arch/arm64/include/asm/compat.h @@ -215,7 +215,6 @@ typedef struct compat_siginfo { } compat_siginfo_t; #define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL /* * A pointer passed in from user mode. This should not diff --git a/arch/mips/include/asm/compat.h b/arch/mips/include/asm/compat.h index 8e2b5b556488..49691331ada4 100644 --- a/arch/mips/include/asm/compat.h +++ b/arch/mips/include/asm/compat.h @@ -200,7 +200,6 @@ typedef struct compat_siginfo { } compat_siginfo_t; #define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL /* * A pointer passed in from user mode. This should not diff --git a/arch/parisc/include/asm/compat.h b/arch/parisc/include/asm/compat.h index 07f48827afda..acf8aa07cbe0 100644 --- a/arch/parisc/include/asm/compat.h +++ b/arch/parisc/include/asm/compat.h @@ -195,7 +195,6 @@ typedef struct compat_siginfo { } compat_siginfo_t; #define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL struct compat_ipc64_perm { compat_key_t key; diff --git a/arch/powerpc/include/asm/compat.h b/arch/powerpc/include/asm/compat.h index a035b1e5dfa7..8a2aecfe9b02 100644 --- a/arch/powerpc/include/asm/compat.h +++ b/arch/powerpc/include/asm/compat.h @@ -185,7 +185,6 @@ typedef struct compat_siginfo { } compat_siginfo_t; #define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL /* * A pointer passed in from user mode. This should not diff --git a/arch/s390/include/asm/compat.h b/arch/s390/include/asm/compat.h index 1b60eb3676d5..5e6a63641a5f 100644 --- a/arch/s390/include/asm/compat.h +++ b/arch/s390/include/asm/compat.h @@ -263,7 +263,6 @@ typedef struct compat_siginfo { #define si_overrun _sifields._timer._overrun #define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL /* * A pointer passed in from user mode. This should not diff --git a/arch/sparc/include/asm/compat.h b/arch/sparc/include/asm/compat.h index 977c3f280ba1..fa38c78de0f0 100644 --- a/arch/sparc/include/asm/compat.h +++ b/arch/sparc/include/asm/compat.h @@ -209,7 +209,6 @@ typedef struct compat_siginfo { } compat_siginfo_t; #define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL /* * A pointer passed in from user mode. This should not diff --git a/arch/tile/include/asm/compat.h b/arch/tile/include/asm/compat.h index c14e36f008c8..62a7b83025dd 100644 --- a/arch/tile/include/asm/compat.h +++ b/arch/tile/include/asm/compat.h @@ -173,7 +173,6 @@ typedef struct compat_siginfo { } compat_siginfo_t; #define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL struct compat_ipc64_perm { compat_key_t key; diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h index 9eef9cc64c68..70bc1df580b2 100644 --- a/arch/x86/include/asm/compat.h +++ b/arch/x86/include/asm/compat.h @@ -209,7 +209,6 @@ typedef struct compat_siginfo { } compat_siginfo_t; #define COMPAT_OFF_T_MAX 0x7fffffff -#define COMPAT_LOFF_T_MAX 0x7fffffffffffffffL struct compat_ipc64_perm { compat_key_t key; diff --git a/fs/fcntl.c b/fs/fcntl.c index 6fd311367efc..0345a46b8856 100644 --- a/fs/fcntl.c +++ b/fs/fcntl.c @@ -563,6 +563,9 @@ static int put_compat_flock64(const struct flock *kfl, struct compat_flock64 __u { struct compat_flock64 fl; + BUILD_BUG_ON(sizeof(kfl->l_start) > sizeof(ufl->l_start)); + BUILD_BUG_ON(sizeof(kfl->l_len) > sizeof(ufl->l_len)); + memset(&fl, 0, sizeof(struct compat_flock64)); copy_flock_fields(&fl, kfl); if (copy_to_user(ufl, &fl, sizeof(struct compat_flock64))) @@ -641,12 +644,8 @@ COMPAT_SYSCALL_DEFINE3(fcntl64, unsigned int, fd, unsigned int, cmd, if (err) break; err = fcntl_getlk(f.file, convert_fcntl_cmd(cmd), &flock); - if (err) - break; - err = fixup_compat_flock(&flock); - if (err) - return err; - err = put_compat_flock64(&flock, compat_ptr(arg)); + if (!err) + err = put_compat_flock64(&flock, compat_ptr(arg)); break; case F_SETLK: case F_SETLKW: From e587b76e655441b7f3239dfb999c444c8573820c Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 5 Dec 2017 23:27:57 +0000 Subject: [PATCH 0360/1013] fix kcm_clone() commit a5739435b5a3b8c449f8844ecd71a3b1e89f0a33 upstream. 1) it's fput() or sock_release(), not both 2) don't do fd_install() until the last failure exit. 3) not a bug per se, but... don't attach socket to struct file until it's set up. Take reserving descriptor into the caller, move fd_install() to the caller, sanitize failure exits and calling conventions. Acked-by: Tom Herbert Signed-off-by: Al Viro Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/kcm/kcmsock.c | 71 ++++++++++++++++++----------------------------- 1 file changed, 27 insertions(+), 44 deletions(-) diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index af4e76ac88ff..c5fa634e63ca 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -1625,60 +1625,35 @@ static struct proto kcm_proto = { }; /* Clone a kcm socket. */ -static int kcm_clone(struct socket *osock, struct kcm_clone *info, - struct socket **newsockp) +static struct file *kcm_clone(struct socket *osock) { struct socket *newsock; struct sock *newsk; - struct file *newfile; - int err, newfd; + struct file *file; - err = -ENFILE; newsock = sock_alloc(); if (!newsock) - goto out; + return ERR_PTR(-ENFILE); newsock->type = osock->type; newsock->ops = osock->ops; __module_get(newsock->ops->owner); - newfd = get_unused_fd_flags(0); - if (unlikely(newfd < 0)) { - err = newfd; - goto out_fd_fail; - } - - newfile = sock_alloc_file(newsock, 0, osock->sk->sk_prot_creator->name); - if (unlikely(IS_ERR(newfile))) { - err = PTR_ERR(newfile); - goto out_sock_alloc_fail; - } - newsk = sk_alloc(sock_net(osock->sk), PF_KCM, GFP_KERNEL, &kcm_proto, true); if (!newsk) { - err = -ENOMEM; - goto out_sk_alloc_fail; + sock_release(newsock); + return ERR_PTR(-ENOMEM); } - sock_init_data(newsock, newsk); init_kcm_sock(kcm_sk(newsk), kcm_sk(osock->sk)->mux); - fd_install(newfd, newfile); - *newsockp = newsock; - info->fd = newfd; - - return 0; + file = sock_alloc_file(newsock, 0, osock->sk->sk_prot_creator->name); + if (IS_ERR(file)) + sock_release(newsock); -out_sk_alloc_fail: - fput(newfile); -out_sock_alloc_fail: - put_unused_fd(newfd); -out_fd_fail: - sock_release(newsock); -out: - return err; + return file; } static int kcm_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) @@ -1708,17 +1683,25 @@ static int kcm_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) } case SIOCKCMCLONE: { struct kcm_clone info; - struct socket *newsock = NULL; - - err = kcm_clone(sock, &info, &newsock); - if (!err) { - if (copy_to_user((void __user *)arg, &info, - sizeof(info))) { - err = -EFAULT; - sys_close(info.fd); - } - } + struct file *file; + + info.fd = get_unused_fd_flags(0); + if (unlikely(info.fd < 0)) + return info.fd; + file = kcm_clone(sock); + if (IS_ERR(file)) { + put_unused_fd(info.fd); + return PTR_ERR(file); + } + if (copy_to_user((void __user *)arg, &info, + sizeof(info))) { + put_unused_fd(info.fd); + fput(file); + return -EFAULT; + } + fd_install(info.fd, file); + err = 0; break; } default: From d7a71904e6027136bea9a24cb30e75777d8d9256 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 16 Nov 2017 17:58:17 +0000 Subject: [PATCH 0361/1013] KVM: arm/arm64: vgic-its: Preserve the revious read from the pending table commit 64afe6e9eb4841f35317da4393de21a047a883b3 upstream. The current pending table parsing code assumes that we keep the previous read of the pending bits, but keep that variable in the current block, making sure it is discarded on each loop. We end-up using whatever is on the stack. Who knows, it might just be the right thing... Fixes: 33d3bc9556a7d ("KVM: arm64: vgic-its: Read initial LPI pending table") Reported-by: AKASHI Takahiro Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- virt/kvm/arm/vgic/vgic-its.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/virt/kvm/arm/vgic/vgic-its.c b/virt/kvm/arm/vgic/vgic-its.c index 3108e07526af..59ce2fb49821 100644 --- a/virt/kvm/arm/vgic/vgic-its.c +++ b/virt/kvm/arm/vgic/vgic-its.c @@ -393,6 +393,7 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu) int ret = 0; u32 *intids; int nr_irqs, i; + u8 pendmask; nr_irqs = vgic_copy_lpi_list(vcpu, &intids); if (nr_irqs < 0) @@ -400,7 +401,6 @@ static int its_sync_lpi_pending_table(struct kvm_vcpu *vcpu) for (i = 0; i < nr_irqs; i++) { int byte_offset, bit_nr; - u8 pendmask; byte_offset = intids[i] / BITS_PER_BYTE; bit_nr = intids[i] % BITS_PER_BYTE; From b0c08c89ea1c0f47423d1da0bcdcd182611823c6 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Thu, 12 Oct 2017 18:22:25 +0900 Subject: [PATCH 0362/1013] kbuild: do not call cc-option before KBUILD_CFLAGS initialization [ Upstream commit 433dc2ebe7d17dd21cba7ad5c362d37323592236 ] Some $(call cc-option,...) are invoked very early, even before KBUILD_CFLAGS, etc. are initialized. The returned string from $(call cc-option,...) depends on KBUILD_CPPFLAGS, KBUILD_CFLAGS, and GCC_PLUGINS_CFLAGS. Since they are exported, they are not empty when the top Makefile is recursively invoked. The recursion occurs in several places. For example, the top Makefile invokes itself for silentoldconfig. "make tinyconfig", "make rpm-pkg" are the cases, too. In those cases, the second call of cc-option from the same line runs a different shell command due to non-pristine KBUILD_CFLAGS. To get the same result all the time, KBUILD_* and GCC_PLUGINS_CFLAGS must be initialized before any call of cc-option. This avoids garbage data in the .cache.mk file. Move all calls of cc-option below the config targets because target compiler flags are unnecessary for Kconfig. Signed-off-by: Masahiro Yamada Reviewed-by: Douglas Anderson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- Makefile | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/Makefile b/Makefile index eabbd7748a24..793183f9e3e5 100644 --- a/Makefile +++ b/Makefile @@ -373,9 +373,6 @@ LDFLAGS_MODULE = CFLAGS_KERNEL = AFLAGS_KERNEL = LDFLAGS_vmlinux = -CFLAGS_GCOV := -fprofile-arcs -ftest-coverage -fno-tree-loop-im $(call cc-disable-warning,maybe-uninitialized,) -CFLAGS_KCOV := $(call cc-option,-fsanitize-coverage=trace-pc,) - # Use USERINCLUDE when you must reference the UAPI directories only. USERINCLUDE := \ @@ -394,21 +391,19 @@ LINUXINCLUDE := \ -I$(objtree)/include \ $(USERINCLUDE) -KBUILD_CPPFLAGS := -D__KERNEL__ - +KBUILD_AFLAGS := -D__ASSEMBLY__ KBUILD_CFLAGS := -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \ -fno-strict-aliasing -fno-common -fshort-wchar \ -Werror-implicit-function-declaration \ -Wno-format-security \ - -std=gnu89 $(call cc-option,-fno-PIE) - - + -std=gnu89 +KBUILD_CPPFLAGS := -D__KERNEL__ KBUILD_AFLAGS_KERNEL := KBUILD_CFLAGS_KERNEL := -KBUILD_AFLAGS := -D__ASSEMBLY__ $(call cc-option,-fno-PIE) KBUILD_AFLAGS_MODULE := -DMODULE KBUILD_CFLAGS_MODULE := -DMODULE KBUILD_LDFLAGS_MODULE := -T $(srctree)/scripts/module-common.lds +GCC_PLUGINS_CFLAGS := # Read KERNELRELEASE from include/config/kernel.release (if it exists) KERNELRELEASE = $(shell cat include/config/kernel.release 2> /dev/null) @@ -421,7 +416,7 @@ export MAKE AWK GENKSYMS INSTALLKERNEL PERL PYTHON UTS_MACHINE export HOSTCXX HOSTCXXFLAGS LDFLAGS_MODULE CHECK CHECKFLAGS export KBUILD_CPPFLAGS NOSTDINC_FLAGS LINUXINCLUDE OBJCOPYFLAGS LDFLAGS -export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_GCOV CFLAGS_KCOV CFLAGS_KASAN CFLAGS_UBSAN +export KBUILD_CFLAGS CFLAGS_KERNEL CFLAGS_MODULE CFLAGS_KASAN CFLAGS_UBSAN export KBUILD_AFLAGS AFLAGS_KERNEL AFLAGS_MODULE export KBUILD_AFLAGS_MODULE KBUILD_CFLAGS_MODULE KBUILD_LDFLAGS_MODULE export KBUILD_AFLAGS_KERNEL KBUILD_CFLAGS_KERNEL @@ -622,6 +617,12 @@ endif # Defaults to vmlinux, but the arch makefile usually adds further targets all: vmlinux +KBUILD_CFLAGS += $(call cc-option,-fno-PIE) +KBUILD_AFLAGS += $(call cc-option,-fno-PIE) +CFLAGS_GCOV := -fprofile-arcs -ftest-coverage -fno-tree-loop-im $(call cc-disable-warning,maybe-uninitialized,) +CFLAGS_KCOV := $(call cc-option,-fsanitize-coverage=trace-pc,) +export CFLAGS_GCOV CFLAGS_KCOV + # The arch Makefile can set ARCH_{CPP,A,C}FLAGS to override the default # values of the respective KBUILD_* variables ARCH_CPPFLAGS := From 854b27edf7b0672e5181e75274dc734863c89f2e Mon Sep 17 00:00:00 2001 From: Vaidyanathan Srinivasan Date: Thu, 24 Aug 2017 00:28:41 +0530 Subject: [PATCH 0363/1013] powerpc/powernv/idle: Round up latency and residency values [ Upstream commit 8d4e10e9ed9450e18fbbf6a8872be0eac9fd4999 ] On PowerNV platforms, firmware provides exit latency and target residency for each of the idle states in nano seconds. Cpuidle framework expects the values in micro seconds. Round up to nearest micro seconds to avoid errors in cases where the values are defined as fractional micro seconds. Default idle state of 'snooze' has exit latency of zero. If other states have fractional micro second exit latency, they would get rounded down to zero micro second and make cpuidle framework choose deeper idle state when snooze loop is the right choice. Reported-by: Anton Blanchard Signed-off-by: Vaidyanathan Srinivasan Reviewed-by: Gautham R. Shenoy Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/cpuidle/cpuidle-powernv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpuidle/cpuidle-powernv.c b/drivers/cpuidle/cpuidle-powernv.c index ed6531f075c6..e06605b21841 100644 --- a/drivers/cpuidle/cpuidle-powernv.c +++ b/drivers/cpuidle/cpuidle-powernv.c @@ -384,9 +384,9 @@ static int powernv_add_idle_states(void) * Firmware passes residency and latency values in ns. * cpuidle expects it in us. */ - exit_latency = latency_ns[i] / 1000; + exit_latency = DIV_ROUND_UP(latency_ns[i], 1000); if (!rc) - target_residency = residency_ns[i] / 1000; + target_residency = DIV_ROUND_UP(residency_ns[i], 1000); else target_residency = 0; From 7015ca81bc8041c17f2d45943703c5b887207d5d Mon Sep 17 00:00:00 2001 From: Keefe Liu Date: Thu, 9 Nov 2017 20:09:31 +0800 Subject: [PATCH 0364/1013] ipvlan: fix ipv6 outbound device [ Upstream commit ca29fd7cce5a6444d57fb86517589a1a31c759e1 ] When process the outbound packet of ipv6, we should assign the master device to output device other than input device. Signed-off-by: Keefe Liu Acked-by: Mahesh Bandewar Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ipvlan/ipvlan_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c index 1f3295e274d0..8feb84fd4ca7 100644 --- a/drivers/net/ipvlan/ipvlan_core.c +++ b/drivers/net/ipvlan/ipvlan_core.c @@ -409,7 +409,7 @@ static int ipvlan_process_v6_outbound(struct sk_buff *skb) struct dst_entry *dst; int err, ret = NET_XMIT_DROP; struct flowi6 fl6 = { - .flowi6_iif = dev->ifindex, + .flowi6_oif = dev->ifindex, .daddr = ip6h->daddr, .saddr = ip6h->saddr, .flowi6_flags = FLOWI_FLAG_ANYSRC, From bd099ef951dd15883c010e044b3c9de4276dd5ae Mon Sep 17 00:00:00 2001 From: Hongxu Jia Date: Fri, 10 Nov 2017 15:59:17 +0800 Subject: [PATCH 0365/1013] ide: ide-atapi: fix compile error with defining macro DEBUG [ Upstream commit 8dc7a31fbce5e2dbbacd83d910da37105181b054 ] Compile ide-atapi failed with defining macro "DEBUG" ... |drivers/ide/ide-atapi.c:285:52: error: 'struct request' has no member named 'cmd'; did you mean 'csd'? | debug_log("%s: rq->cmd[0]: 0x%x\n", __func__, rq->cmd[0]); ... Since we split the scsi_request out of struct request, it missed do the same thing on debug_log Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request") Signed-off-by: Hongxu Jia Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/ide/ide-atapi.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/ide/ide-atapi.c b/drivers/ide/ide-atapi.c index 14d1e7d9a1d6..0e6bc631a1ca 100644 --- a/drivers/ide/ide-atapi.c +++ b/drivers/ide/ide-atapi.c @@ -282,7 +282,7 @@ int ide_cd_expiry(ide_drive_t *drive) struct request *rq = drive->hwif->rq; unsigned long wait = 0; - debug_log("%s: rq->cmd[0]: 0x%x\n", __func__, rq->cmd[0]); + debug_log("%s: scsi_req(rq)->cmd[0]: 0x%x\n", __func__, scsi_req(rq)->cmd[0]); /* * Some commands are *slow* and normally take a long time to complete. @@ -463,7 +463,7 @@ static ide_startstop_t ide_pc_intr(ide_drive_t *drive) return ide_do_reset(drive); } - debug_log("[cmd %x]: check condition\n", rq->cmd[0]); + debug_log("[cmd %x]: check condition\n", scsi_req(rq)->cmd[0]); /* Retry operation */ ide_retry_pc(drive); @@ -531,7 +531,7 @@ static ide_startstop_t ide_pc_intr(ide_drive_t *drive) ide_pad_transfer(drive, write, bcount); debug_log("[cmd %x] transferred %d bytes, padded %d bytes, resid: %u\n", - rq->cmd[0], done, bcount, scsi_req(rq)->resid_len); + scsi_req(rq)->cmd[0], done, bcount, scsi_req(rq)->resid_len); /* And set the interrupt handler again */ ide_set_handler(drive, ide_pc_intr, timeout); From a4000d951e3723d5aff6dd4ce069c5908365dc24 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 8 Nov 2017 10:23:45 -0800 Subject: [PATCH 0366/1013] blk-mq: Avoid that request queue removal can trigger list corruption [ Upstream commit aba7afc5671c23beade64d10caf86e24a9105dab ] Avoid that removal of a request queue sporadically triggers the following warning: list_del corruption. next->prev should be ffff8807d649b970, but was 6b6b6b6b6b6b6b6b WARNING: CPU: 3 PID: 342 at lib/list_debug.c:56 __list_del_entry_valid+0x92/0xa0 Call Trace: process_one_work+0x11b/0x660 worker_thread+0x3d/0x3b0 kthread+0x129/0x140 ret_from_fork+0x27/0x40 Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- block/blk-core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/block/blk-core.c b/block/blk-core.c index 516ce3174683..7b30bf10b1d4 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -339,6 +339,7 @@ void blk_sync_queue(struct request_queue *q) struct blk_mq_hw_ctx *hctx; int i; + cancel_delayed_work_sync(&q->requeue_work); queue_for_each_hw_ctx(q, hctx, i) cancel_delayed_work_sync(&hctx->run_work); } else { From 7536280f9c949a97e06381a6137412c06838c3e7 Mon Sep 17 00:00:00 2001 From: Israel Rukshin Date: Sun, 5 Nov 2017 08:43:01 +0000 Subject: [PATCH 0367/1013] nvmet-rdma: update queue list during ib_device removal [ Upstream commit 43b92fd27aaef0f529c9321cfebbaec1d7b8f503 ] A NULL deref happens when nvmet_rdma_remove_one() is called more than once (e.g. while connected via 2 ports). The first call frees the queues related to the first ib_device but doesn't remove them from the queue list. While calling nvmet_rdma_remove_one() for the second ib_device it goes over the full queue list again and we get the NULL deref. Fixes: f1d4ef7d ("nvmet-rdma: register ib_client to not deadlock in device removal") Signed-off-by: Israel Rukshin Reviewed-by: Max Gurtovoy Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/nvme/target/rdma.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 76d2bb793afe..3333d417b248 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1512,15 +1512,17 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { - struct nvmet_rdma_queue *queue; + struct nvmet_rdma_queue *queue, *tmp; /* Device is being removed, delete all queues using this device */ mutex_lock(&nvmet_rdma_queue_mutex); - list_for_each_entry(queue, &nvmet_rdma_queue_list, queue_list) { + list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list, + queue_list) { if (queue->dev->device != ib_device) continue; pr_info("Removing queue %d\n", queue->idx); + list_del_init(&queue->queue_list); __nvmet_rdma_queue_disconnect(queue); } mutex_unlock(&nvmet_rdma_queue_mutex); From 4086f7cf0c3e2fe275a2a18dc25749df348c0cdb Mon Sep 17 00:00:00 2001 From: Steve Grubb Date: Tue, 17 Oct 2017 18:29:22 -0400 Subject: [PATCH 0368/1013] audit: Allow auditd to set pid to 0 to end auditing [ Upstream commit 33e8a907804428109ce1d12301c3365d619cc4df ] The API to end auditing has historically been for auditd to set the pid to 0. This patch restores that functionality. See: https://github.com/linux-audit/audit-kernel/issues/69 Reviewed-by: Richard Guy Briggs Signed-off-by: Steve Grubb Signed-off-by: Paul Moore Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/audit.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) diff --git a/kernel/audit.c b/kernel/audit.c index be1c28fd4d57..d779326e53c0 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -1197,25 +1197,28 @@ static int audit_receive_msg(struct sk_buff *skb, struct nlmsghdr *nlh) pid_t auditd_pid; struct pid *req_pid = task_tgid(current); - /* sanity check - PID values must match */ - if (new_pid != pid_vnr(req_pid)) + /* Sanity check - PID values must match. Setting + * pid to 0 is how auditd ends auditing. */ + if (new_pid && (new_pid != pid_vnr(req_pid))) return -EINVAL; /* test the auditd connection */ audit_replace(req_pid); auditd_pid = auditd_pid_vnr(); - /* only the current auditd can unregister itself */ - if ((!new_pid) && (new_pid != auditd_pid)) { - audit_log_config_change("audit_pid", new_pid, - auditd_pid, 0); - return -EACCES; - } - /* replacing a healthy auditd is not allowed */ - if (auditd_pid && new_pid) { - audit_log_config_change("audit_pid", new_pid, - auditd_pid, 0); - return -EEXIST; + if (auditd_pid) { + /* replacing a healthy auditd is not allowed */ + if (new_pid) { + audit_log_config_change("audit_pid", + new_pid, auditd_pid, 0); + return -EEXIST; + } + /* only current auditd can unregister itself */ + if (pid_vnr(req_pid) != auditd_pid) { + audit_log_config_change("audit_pid", + new_pid, auditd_pid, 0); + return -EACCES; + } } if (new_pid) { From 0ad0bb60166d8e4fbacaaaaaeb10a24de5e99aff Mon Sep 17 00:00:00 2001 From: Paul Moore Date: Fri, 1 Sep 2017 09:44:34 -0400 Subject: [PATCH 0369/1013] audit: ensure that 'audit=1' actually enables audit for PID 1 [ Upstream commit 173743dd99a49c956b124a74c8aacb0384739a4c ] Prior to this patch we enabled audit in audit_init(), which is too late for PID 1 as the standard initcalls are run after the PID 1 task is forked. This means that we never allocate an audit_context (see audit_alloc()) for PID 1 and therefore miss a lot of audit events generated by PID 1. This patch enables audit as early as possible to help ensure that when PID 1 is forked it can allocate an audit_context if required. Reviewed-by: Richard Guy Briggs Signed-off-by: Paul Moore Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/audit.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/kernel/audit.c b/kernel/audit.c index d779326e53c0..5b34d3114af4 100644 --- a/kernel/audit.c +++ b/kernel/audit.c @@ -85,13 +85,13 @@ static int audit_initialized; #define AUDIT_OFF 0 #define AUDIT_ON 1 #define AUDIT_LOCKED 2 -u32 audit_enabled; -u32 audit_ever_enabled; +u32 audit_enabled = AUDIT_OFF; +u32 audit_ever_enabled = !!AUDIT_OFF; EXPORT_SYMBOL_GPL(audit_enabled); /* Default state when kernel boots without any parameters. */ -static u32 audit_default; +static u32 audit_default = AUDIT_OFF; /* If auditing cannot proceed, audit_failure selects what happens. */ static u32 audit_failure = AUDIT_FAIL_PRINTK; @@ -1552,8 +1552,6 @@ static int __init audit_init(void) register_pernet_subsys(&audit_net_ops); audit_initialized = AUDIT_INITIALIZED; - audit_enabled = audit_default; - audit_ever_enabled |= !!audit_default; kauditd_task = kthread_run(kauditd_thread, NULL, "kauditd"); if (IS_ERR(kauditd_task)) { @@ -1575,6 +1573,8 @@ static int __init audit_enable(char *str) audit_default = !!simple_strtol(str, NULL, 0); if (!audit_default) audit_initialized = AUDIT_DISABLED; + audit_enabled = audit_default; + audit_ever_enabled = !!audit_enabled; pr_info("%s\n", audit_default ? "enabled (after initialization)" : "disabled (until reboot)"); From 2c727856d07198ca0c0d054480e608f0f759f6c5 Mon Sep 17 00:00:00 2001 From: Heinz Mauelshagen Date: Thu, 2 Nov 2017 19:58:28 +0100 Subject: [PATCH 0370/1013] dm raid: fix panic when attempting to force a raid to sync [ Upstream commit 233978449074ca7e45d9c959f9ec612d1b852893 ] Requesting a sync on an active raid device via a table reload (see 'sync' parameter in Documentation/device-mapper/dm-raid.txt) skips the super_load() call that defines the superblock size (rdev->sb_size) -- resulting in an oops if/when super_sync()->memset() is called. Fix by moving the initialization of the superblock start and size out of super_load() to the caller (analyse_superblocks). Signed-off-by: Heinz Mauelshagen Signed-off-by: Mike Snitzer Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-raid.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c index 2245d06d2045..a25eebd98996 100644 --- a/drivers/md/dm-raid.c +++ b/drivers/md/dm-raid.c @@ -2143,13 +2143,6 @@ static int super_load(struct md_rdev *rdev, struct md_rdev *refdev) struct dm_raid_superblock *refsb; uint64_t events_sb, events_refsb; - rdev->sb_start = 0; - rdev->sb_size = bdev_logical_block_size(rdev->meta_bdev); - if (rdev->sb_size < sizeof(*sb) || rdev->sb_size > PAGE_SIZE) { - DMERR("superblock size of a logical block is no longer valid"); - return -EINVAL; - } - r = read_disk_sb(rdev, rdev->sb_size, false); if (r) return r; @@ -2494,6 +2487,17 @@ static int analyse_superblocks(struct dm_target *ti, struct raid_set *rs) if (test_bit(Journal, &rdev->flags)) continue; + if (!rdev->meta_bdev) + continue; + + /* Set superblock offset/size for metadata device. */ + rdev->sb_start = 0; + rdev->sb_size = bdev_logical_block_size(rdev->meta_bdev); + if (rdev->sb_size < sizeof(struct dm_raid_superblock) || rdev->sb_size > PAGE_SIZE) { + DMERR("superblock size of a logical block is no longer valid"); + return -EINVAL; + } + /* * Skipping super_load due to CTR_FLAG_SYNC will cause * the array to undergo initialization again as @@ -2506,9 +2510,6 @@ static int analyse_superblocks(struct dm_target *ti, struct raid_set *rs) if (test_bit(__CTR_FLAG_SYNC, &rs->ctr_flags)) continue; - if (!rdev->meta_bdev) - continue; - r = super_load(rdev, freshest); switch (r) { From 89a459e0d519b289c15e4da8f52ff5528c7b8a3c Mon Sep 17 00:00:00 2001 From: Zdenek Kabelac Date: Wed, 8 Nov 2017 13:44:56 +0100 Subject: [PATCH 0371/1013] md: free unused memory after bitmap resize [ Upstream commit 0868b99c214a3d55486c700de7c3f770b7243e7c ] When bitmap is resized, the old kalloced chunks just are not released once the resized bitmap starts to use new space. This fixes in particular kmemleak reports like this one: unreferenced object 0xffff8f4311e9c000 (size 4096): comm "lvm", pid 19333, jiffies 4295263268 (age 528.265s) hex dump (first 32 bytes): 02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80 ................ 02 80 02 80 02 80 02 80 02 80 02 80 02 80 02 80 ................ backtrace: [] kmemleak_alloc+0x4a/0xa0 [] kmem_cache_alloc_trace+0x14e/0x2e0 [] bitmap_checkpage+0x7c/0x110 [] bitmap_get_counter+0x45/0xd0 [] bitmap_set_memory_bits+0x43/0xe0 [] bitmap_init_from_disk+0x23c/0x530 [] bitmap_load+0xbe/0x160 [] raid_preresume+0x203/0x2f0 [dm_raid] [] dm_table_resume_targets+0x4f/0xe0 [] dm_resume+0x122/0x140 [] dev_suspend+0x18f/0x290 [] ctl_ioctl+0x287/0x560 [] dm_ctl_ioctl+0x13/0x20 [] do_vfs_ioctl+0xa6/0x750 [] SyS_ioctl+0x79/0x90 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 Signed-off-by: Zdenek Kabelac Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/bitmap.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/md/bitmap.c b/drivers/md/bitmap.c index f425905c97fa..0cabf31fb163 100644 --- a/drivers/md/bitmap.c +++ b/drivers/md/bitmap.c @@ -2158,6 +2158,7 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks, for (k = 0; k < page; k++) { kfree(new_bp[k].map); } + kfree(new_bp); /* restore some fields from old_counts */ bitmap->counts.bp = old_counts.bp; @@ -2208,6 +2209,14 @@ int bitmap_resize(struct bitmap *bitmap, sector_t blocks, block += old_blocks; } + if (bitmap->counts.bp != old_counts.bp) { + unsigned long k; + for (k = 0; k < old_counts.pages; k++) + if (!old_counts.bp[k].hijacked) + kfree(old_counts.bp[k].map); + kfree(old_counts.bp); + } + if (!init) { int i; while (block < (chunks << chunkshift)) { From f0a7d7b4298ce318fe07266af899a9b7c4227ad6 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 25 Oct 2017 23:10:19 +0300 Subject: [PATCH 0372/1013] RDMA/cxgb4: Annotate r2 and stag as __be32 [ Upstream commit 7d7d065a5eec7e218174d5c64a9f53f99ffdb119 ] Chelsio cxgb4 HW is big-endian, hence there is need to properly annotate r2 and stag fields as __be32 and not __u32 to fix the following sparse warnings. drivers/infiniband/hw/cxgb4/qp.c:614:16: warning: incorrect type in assignment (different base types) expected unsigned int [unsigned] [usertype] r2 got restricted __be32 [usertype] drivers/infiniband/hw/cxgb4/qp.c:615:18: warning: incorrect type in assignment (different base types) expected unsigned int [unsigned] [usertype] stag got restricted __be32 [usertype] Cc: Steve Wise Signed-off-by: Leon Romanovsky Reviewed-by: Steve Wise Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/cxgb4/t4fw_ri_api.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h index 010c709ba3bb..58c531db4f4a 100644 --- a/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h +++ b/drivers/infiniband/hw/cxgb4/t4fw_ri_api.h @@ -675,8 +675,8 @@ struct fw_ri_fr_nsmr_tpte_wr { __u16 wrid; __u8 r1[3]; __u8 len16; - __u32 r2; - __u32 stag; + __be32 r2; + __be32 stag; struct fw_ri_tpte tpte; __u64 pbl[2]; }; From 93e2d845f265f8ff006f6ac5134f038ad382614e Mon Sep 17 00:00:00 2001 From: Reinette Chatre Date: Fri, 20 Oct 2017 02:16:58 -0700 Subject: [PATCH 0373/1013] x86/intel_rdt: Fix potential deadlock during resctrl unmount [ Upstream commit 36b6f9fcb8928c06b6638a4cf91bc9d69bb49aa2 ] Lockdep warns about a potential deadlock: [ 66.782842] ====================================================== [ 66.782888] WARNING: possible circular locking dependency detected [ 66.782937] 4.14.0-rc2-test-test+ #48 Not tainted [ 66.782983] ------------------------------------------------------ [ 66.783052] umount/336 is trying to acquire lock: [ 66.783117] (cpu_hotplug_lock.rw_sem){++++}, at: [] rdt_kill_sb+0x215/0x390 [ 66.783193] but task is already holding lock: [ 66.783244] (rdtgroup_mutex){+.+.}, at: [] rdt_kill_sb+0x36/0x390 [ 66.783305] which lock already depends on the new lock. [ 66.783364] the existing dependency chain (in reverse order) is: [ 66.783419] -> #3 (rdtgroup_mutex){+.+.}: [ 66.783467] __lock_acquire+0x1293/0x13f0 [ 66.783509] lock_acquire+0xaf/0x220 [ 66.783543] __mutex_lock+0x71/0x9b0 [ 66.783575] mutex_lock_nested+0x1b/0x20 [ 66.783610] intel_rdt_online_cpu+0x3b/0x430 [ 66.783649] cpuhp_invoke_callback+0xab/0x8e0 [ 66.783687] cpuhp_thread_fun+0x7a/0x150 [ 66.783722] smpboot_thread_fn+0x1cc/0x270 [ 66.783764] kthread+0x16e/0x190 [ 66.783794] ret_from_fork+0x27/0x40 [ 66.783825] -> #2 (cpuhp_state){+.+.}: [ 66.783870] __lock_acquire+0x1293/0x13f0 [ 66.783906] lock_acquire+0xaf/0x220 [ 66.783938] cpuhp_issue_call+0x102/0x170 [ 66.783974] __cpuhp_setup_state_cpuslocked+0x154/0x2a0 [ 66.784023] __cpuhp_setup_state+0xc7/0x170 [ 66.784061] page_writeback_init+0x43/0x67 [ 66.784097] pagecache_init+0x43/0x4a [ 66.784131] start_kernel+0x3ad/0x3f7 [ 66.784165] x86_64_start_reservations+0x2a/0x2c [ 66.784204] x86_64_start_kernel+0x72/0x75 [ 66.784241] verify_cpu+0x0/0xfb [ 66.784270] -> #1 (cpuhp_state_mutex){+.+.}: [ 66.784319] __lock_acquire+0x1293/0x13f0 [ 66.784355] lock_acquire+0xaf/0x220 [ 66.784387] __mutex_lock+0x71/0x9b0 [ 66.784419] mutex_lock_nested+0x1b/0x20 [ 66.784454] __cpuhp_setup_state_cpuslocked+0x52/0x2a0 [ 66.784497] __cpuhp_setup_state+0xc7/0x170 [ 66.784535] page_alloc_init+0x28/0x30 [ 66.784569] start_kernel+0x148/0x3f7 [ 66.784602] x86_64_start_reservations+0x2a/0x2c [ 66.784642] x86_64_start_kernel+0x72/0x75 [ 66.784678] verify_cpu+0x0/0xfb [ 66.784707] -> #0 (cpu_hotplug_lock.rw_sem){++++}: [ 66.784759] check_prev_add+0x32f/0x6e0 [ 66.784794] __lock_acquire+0x1293/0x13f0 [ 66.784830] lock_acquire+0xaf/0x220 [ 66.784863] cpus_read_lock+0x3d/0xb0 [ 66.784896] rdt_kill_sb+0x215/0x390 [ 66.784930] deactivate_locked_super+0x3e/0x70 [ 66.784968] deactivate_super+0x40/0x60 [ 66.785003] cleanup_mnt+0x3f/0x80 [ 66.785034] __cleanup_mnt+0x12/0x20 [ 66.785070] task_work_run+0x8b/0xc0 [ 66.785103] exit_to_usermode_loop+0x94/0xa0 [ 66.786804] syscall_return_slowpath+0xe8/0x150 [ 66.788502] entry_SYSCALL_64_fastpath+0xab/0xad [ 66.790194] other info that might help us debug this: [ 66.795139] Chain exists of: cpu_hotplug_lock.rw_sem --> cpuhp_state --> rdtgroup_mutex [ 66.800035] Possible unsafe locking scenario: [ 66.803267] CPU0 CPU1 [ 66.804867] ---- ---- [ 66.806443] lock(rdtgroup_mutex); [ 66.808002] lock(cpuhp_state); [ 66.809565] lock(rdtgroup_mutex); [ 66.811110] lock(cpu_hotplug_lock.rw_sem); [ 66.812608] *** DEADLOCK *** [ 66.816983] 2 locks held by umount/336: [ 66.818418] #0: (&type->s_umount_key#35){+.+.}, at: [] deactivate_super+0x38/0x60 [ 66.819922] #1: (rdtgroup_mutex){+.+.}, at: [] rdt_kill_sb+0x36/0x390 When the resctrl filesystem is unmounted the locks should be obtain in the locks in the same order as was done when the cpus came online: cpu_hotplug_lock before rdtgroup_mutex. This also requires to switch the static_branch_disable() calls to the _cpulocked variant because now cpu hotplug lock is held already. [ tglx: Switched to cpus_read_[un]lock ] Signed-off-by: Reinette Chatre Signed-off-by: Thomas Gleixner Tested-by: Sai Praneeth Prakhya Acked-by: Vikas Shivappa Acked-by: Fenghua Yu Acked-by: Tony Luck Link: https://lkml.kernel.org/r/cc292e76be073f7260604651711c47b09fd0dc81.1508490116.git.reinette.chatre@intel.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/intel_rdt_rdtgroup.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c index 3d433af856a5..7be35b600299 100644 --- a/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c +++ b/arch/x86/kernel/cpu/intel_rdt_rdtgroup.c @@ -1297,9 +1297,7 @@ static void rmdir_all_sub(void) kfree(rdtgrp); } /* Notify online CPUs to update per cpu storage and PQR_ASSOC MSR */ - get_online_cpus(); update_closid_rmid(cpu_online_mask, &rdtgroup_default); - put_online_cpus(); kernfs_remove(kn_info); kernfs_remove(kn_mongrp); @@ -1310,6 +1308,7 @@ static void rdt_kill_sb(struct super_block *sb) { struct rdt_resource *r; + cpus_read_lock(); mutex_lock(&rdtgroup_mutex); /*Put everything back to default values. */ @@ -1317,11 +1316,12 @@ static void rdt_kill_sb(struct super_block *sb) reset_all_ctrls(r); cdp_disable(); rmdir_all_sub(); - static_branch_disable(&rdt_alloc_enable_key); - static_branch_disable(&rdt_mon_enable_key); - static_branch_disable(&rdt_enable_key); + static_branch_disable_cpuslocked(&rdt_alloc_enable_key); + static_branch_disable_cpuslocked(&rdt_mon_enable_key); + static_branch_disable_cpuslocked(&rdt_enable_key); kernfs_kill_sb(sb); mutex_unlock(&rdtgroup_mutex); + cpus_read_unlock(); } static struct file_system_type rdt_fs_type = { From ce344631dc13717314184ebd47f5e8b0ff3cfcab Mon Sep 17 00:00:00 2001 From: Daniel Scheller Date: Sun, 29 Oct 2017 11:43:22 -0400 Subject: [PATCH 0374/1013] media: dvb-core: always call invoke_release() in fe_free() commit 62229de19ff2b7f3e0ebf4d48ad99061127d0281 upstream. Follow-up to: ead666000a5f ("media: dvb_frontend: only use kref after initialized") The aforementioned commit fixed refcount OOPSes when demod driver attaching succeeded but tuner driver didn't. However, the use count of the attached demod drivers don't go back to zero and thus couldn't be cleanly unloaded. Improve on this by calling dvb_frontend_invoke_release() in __dvb_frontend_free() regardless of fepriv being NULL, instead of returning when fepriv is NULL. This is safe to do since _invoke_release() will check for passed pointers being valid before calling the .release() function. [mchehab@s-opensource.com: changed the logic a little bit to reduce conflicts with another bug fix patch under review] Fixes: ead666000a5f ("media: dvb_frontend: only use kref after initialized") Signed-off-by: Daniel Scheller Signed-off-by: Mauro Carvalho Chehab Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- drivers/media/dvb-core/dvb_frontend.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c index 9139d01ba7ed..0a8234846699 100644 --- a/drivers/media/dvb-core/dvb_frontend.c +++ b/drivers/media/dvb-core/dvb_frontend.c @@ -145,13 +145,14 @@ static void __dvb_frontend_free(struct dvb_frontend *fe) { struct dvb_frontend_private *fepriv = fe->frontend_priv; - if (!fepriv) - return; - - dvb_free_device(fepriv->dvbdev); + if (fepriv) + dvb_free_device(fepriv->dvbdev); dvb_frontend_invoke_release(fe, fe->ops.release); + if (!fepriv) + return; + kfree(fepriv); fe->frontend_priv = NULL; } From 7bc8eb30f1e02b4dd6fd2869720c64d9bf39d765 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 7 Nov 2017 08:39:39 -0500 Subject: [PATCH 0375/1013] dvb_frontend: don't use-after-free the frontend struct commit b1cb7372fa822af6c06c8045963571d13ad6348b upstream. dvb_frontend_invoke_release() may free the frontend struct. So, the free logic can't update it anymore after calling it. That's OK, as __dvb_frontend_free() is called only when the krefs are zeroed, so nobody is using it anymore. That should fix the following KASAN error: The KASAN report looks like this (running on kernel 3e0cc09a3a2c40ec1ffb6b4e12da86e98feccb11 (4.14-rc5+)): ================================================================== BUG: KASAN: use-after-free in __dvb_frontend_free+0x113/0x120 Write of size 8 at addr ffff880067d45a00 by task kworker/0:1/24 CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.14.0-rc5-43687-g06ab8a23e0e6 #545 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:16 dump_stack+0x292/0x395 lib/dump_stack.c:52 print_address_description+0x78/0x280 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 kasan_report+0x23d/0x350 mm/kasan/report.c:409 __asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435 __dvb_frontend_free+0x113/0x120 drivers/media/dvb-core/dvb_frontend.c:156 dvb_frontend_put+0x59/0x70 drivers/media/dvb-core/dvb_frontend.c:176 dvb_frontend_detach+0x120/0x150 drivers/media/dvb-core/dvb_frontend.c:2803 dvb_usb_adapter_frontend_exit+0xd6/0x160 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:340 dvb_usb_adapter_exit drivers/media/usb/dvb-usb/dvb-usb-init.c:116 dvb_usb_exit+0x9b/0x200 drivers/media/usb/dvb-usb/dvb-usb-init.c:132 dvb_usb_device_exit+0xa5/0xf0 drivers/media/usb/dvb-usb/dvb-usb-init.c:295 usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423 __device_release_driver drivers/base/dd.c:861 device_release_driver_internal+0x4f1/0x5c0 drivers/base/dd.c:893 device_release_driver+0x1e/0x30 drivers/base/dd.c:918 bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565 device_del+0x5c4/0xab0 drivers/base/core.c:1985 usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170 usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124 hub_port_connect drivers/usb/core/hub.c:4754 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x363/0x440 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Allocated by task 24: save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kmem_cache_alloc_trace+0x11e/0x2d0 mm/slub.c:2772 kmalloc ./include/linux/slab.h:493 kzalloc ./include/linux/slab.h:666 dtt200u_fe_attach+0x4c/0x110 drivers/media/usb/dvb-usb/dtt200u-fe.c:212 dtt200u_frontend_attach+0x35/0x80 drivers/media/usb/dvb-usb/dtt200u.c:136 dvb_usb_adapter_frontend_init+0x32b/0x660 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:286 dvb_usb_adapter_init drivers/media/usb/dvb-usb/dvb-usb-init.c:86 dvb_usb_init drivers/media/usb/dvb-usb/dvb-usb-init.c:162 dvb_usb_device_init+0xf73/0x17f0 drivers/media/usb/dvb-usb/dvb-usb-init.c:277 dtt200u_usb_probe+0xa1/0xe0 drivers/media/usb/dvb-usb/dtt200u.c:155 usb_probe_interface+0x35d/0x8e0 drivers/usb/core/driver.c:361 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463 __device_attach+0x26b/0x3c0 drivers/base/dd.c:710 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523 device_add+0xd0b/0x1660 drivers/base/core.c:1835 usb_set_configuration+0x104e/0x1870 drivers/usb/core/message.c:1932 generic_probe+0x73/0xe0 drivers/usb/core/generic.c:174 usb_probe_device+0xaf/0xe0 drivers/usb/core/driver.c:266 really_probe drivers/base/dd.c:413 driver_probe_device+0x610/0xa00 drivers/base/dd.c:557 __device_attach_driver+0x230/0x290 drivers/base/dd.c:653 bus_for_each_drv+0x161/0x210 drivers/base/bus.c:463 __device_attach+0x26b/0x3c0 drivers/base/dd.c:710 device_initial_probe+0x1f/0x30 drivers/base/dd.c:757 bus_probe_device+0x1eb/0x290 drivers/base/bus.c:523 device_add+0xd0b/0x1660 drivers/base/core.c:1835 usb_new_device+0x7b8/0x1020 drivers/usb/core/hub.c:2457 hub_port_connect drivers/usb/core/hub.c:4903 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x194d/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x363/0x440 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 Freed by task 24: save_stack_trace+0x1b/0x20 arch/x86/kernel/stacktrace.c:59 save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 kasan_slab_free+0x72/0xc0 mm/kasan/kasan.c:524 slab_free_hook mm/slub.c:1390 slab_free_freelist_hook mm/slub.c:1412 slab_free mm/slub.c:2988 kfree+0xf6/0x2f0 mm/slub.c:3919 dtt200u_fe_release+0x3c/0x50 drivers/media/usb/dvb-usb/dtt200u-fe.c:202 dvb_frontend_invoke_release.part.13+0x1c/0x30 drivers/media/dvb-core/dvb_frontend.c:2790 dvb_frontend_invoke_release drivers/media/dvb-core/dvb_frontend.c:2789 __dvb_frontend_free+0xad/0x120 drivers/media/dvb-core/dvb_frontend.c:153 dvb_frontend_put+0x59/0x70 drivers/media/dvb-core/dvb_frontend.c:176 dvb_frontend_detach+0x120/0x150 drivers/media/dvb-core/dvb_frontend.c:2803 dvb_usb_adapter_frontend_exit+0xd6/0x160 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:340 dvb_usb_adapter_exit drivers/media/usb/dvb-usb/dvb-usb-init.c:116 dvb_usb_exit+0x9b/0x200 drivers/media/usb/dvb-usb/dvb-usb-init.c:132 dvb_usb_device_exit+0xa5/0xf0 drivers/media/usb/dvb-usb/dvb-usb-init.c:295 usb_unbind_interface+0x21c/0xa90 drivers/usb/core/driver.c:423 __device_release_driver drivers/base/dd.c:861 device_release_driver_internal+0x4f1/0x5c0 drivers/base/dd.c:893 device_release_driver+0x1e/0x30 drivers/base/dd.c:918 bus_remove_device+0x2f4/0x4b0 drivers/base/bus.c:565 device_del+0x5c4/0xab0 drivers/base/core.c:1985 usb_disable_device+0x1e9/0x680 drivers/usb/core/message.c:1170 usb_disconnect+0x260/0x7a0 drivers/usb/core/hub.c:2124 hub_port_connect drivers/usb/core/hub.c:4754 hub_port_connect_change drivers/usb/core/hub.c:5009 port_event drivers/usb/core/hub.c:5115 hub_event+0x1318/0x3740 drivers/usb/core/hub.c:5195 process_one_work+0xc73/0x1d90 kernel/workqueue.c:2119 worker_thread+0x221/0x1850 kernel/workqueue.c:2253 kthread+0x363/0x440 kernel/kthread.c:231 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431 The buggy address belongs to the object at ffff880067d45500 which belongs to the cache kmalloc-2048 of size 2048 The buggy address is located 1280 bytes inside of 2048-byte region [ffff880067d45500, ffff880067d45d00) The buggy address belongs to the page: page:ffffea00019f5000 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 0000000000000000 0000000000000000 00000001000f000f raw: dead000000000100 dead000000000200 ffff88006c002d80 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff880067d45900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880067d45980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880067d45a00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff880067d45a80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff880067d45b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Fixes: ead666000a5f ("media: dvb_frontend: only use kref after initialized") Reported-by: Andrey Konovalov Suggested-by: Matthias Schwarzott Tested-by: Andrey Konovalov Signed-off-by: Mauro Carvalho Chehab Cc: Guenter Roeck Signed-off-by: Greg Kroah-Hartman --- drivers/media/dvb-core/dvb_frontend.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/media/dvb-core/dvb_frontend.c b/drivers/media/dvb-core/dvb_frontend.c index 0a8234846699..33d844fe2e70 100644 --- a/drivers/media/dvb-core/dvb_frontend.c +++ b/drivers/media/dvb-core/dvb_frontend.c @@ -150,11 +150,8 @@ static void __dvb_frontend_free(struct dvb_frontend *fe) dvb_frontend_invoke_release(fe, fe->ops.release); - if (!fepriv) - return; - - kfree(fepriv); - fe->frontend_priv = NULL; + if (fepriv) + kfree(fepriv); } static void dvb_frontend_free(struct kref *ref) From 3afae8437c3cbc22966762e80e81818f5a90eb06 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Sun, 17 Dec 2017 15:08:14 +0100 Subject: [PATCH 0376/1013] Linux 4.14.7 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 793183f9e3e5..39d7af0165a8 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 6 +SUBLEVEL = 7 EXTRAVERSION = NAME = Petit Gorille From 3d27b022022a88e189c0e9d63c4ac01af354735f Mon Sep 17 00:00:00 2001 From: Martin Kaiser Date: Tue, 17 Oct 2017 22:53:08 +0200 Subject: [PATCH 0377/1013] mfd: fsl-imx25: Clean up irq settings during removal commit 18f77393796848e68909e65d692c1d1436f06e06 upstream. When fsl-imx25-tsadc is compiled as a module, loading, unloading and reloading the module will lead to a crash. Unable to handle kernel paging request at virtual address bf005430 [] (irq_find_matching_fwspec) from [] (of_irq_get+0x58/0x74) [] (of_irq_get) from [] (platform_get_irq+0x48/0xc8) [] (platform_get_irq) from [] (mx25_tsadc_probe+0x220/0x2f4 [fsl_imx25_tsadc]) irq_find_matching_fwspec() loops over all registered irq domains. The irq domain is still registered from last time the module was loaded but the pointer to its operations is invalid after the module was unloaded. Add a removal function which clears the irq handler and removes the irq domain. With this cleanup in place, it's possible to unload and reload the module. Signed-off-by: Martin Kaiser Reviewed-by: Lucas Stach Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/fsl-imx25-tsadc.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/drivers/mfd/fsl-imx25-tsadc.c b/drivers/mfd/fsl-imx25-tsadc.c index b3767c3141e5..461b0990b56f 100644 --- a/drivers/mfd/fsl-imx25-tsadc.c +++ b/drivers/mfd/fsl-imx25-tsadc.c @@ -180,6 +180,19 @@ static int mx25_tsadc_probe(struct platform_device *pdev) return devm_of_platform_populate(dev); } +static int mx25_tsadc_remove(struct platform_device *pdev) +{ + struct mx25_tsadc *tsadc = platform_get_drvdata(pdev); + int irq = platform_get_irq(pdev, 0); + + if (irq) { + irq_set_chained_handler_and_data(irq, NULL, NULL); + irq_domain_remove(tsadc->domain); + } + + return 0; +} + static const struct of_device_id mx25_tsadc_ids[] = { { .compatible = "fsl,imx25-tsadc" }, { /* Sentinel */ } @@ -192,6 +205,7 @@ static struct platform_driver mx25_tsadc_driver = { .of_match_table = of_match_ptr(mx25_tsadc_ids), }, .probe = mx25_tsadc_probe, + .remove = mx25_tsadc_remove, }; module_platform_driver(mx25_tsadc_driver); From 96c2dfaebe1a8eba95d43732a1413c777469128c Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 27 Nov 2017 23:23:05 -0800 Subject: [PATCH 0378/1013] crypto: algif_aead - fix reference counting of null skcipher commit b32a7dc8aef1882fbf983eb354837488cc9d54dc upstream. In the AEAD interface for AF_ALG, the reference to the "null skcipher" held by each tfm was being dropped in the wrong place -- when each af_alg_ctx was freed instead of when the aead_tfm was freed. As discovered by syzkaller, a specially crafted program could use this to cause the null skcipher to be freed while it is still in use. Fix it by dropping the reference in the right place. Fixes: 72548b093ee3 ("crypto: algif_aead - copy AAD from src to dst") Reported-by: syzbot Signed-off-by: Eric Biggers Reviewed-by: Stephan Mueller Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/algif_aead.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index d0b45145cb30..3d793bc2aa82 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -503,6 +503,7 @@ static void aead_release(void *private) struct aead_tfm *tfm = private; crypto_free_aead(tfm->aead); + crypto_put_default_null_skcipher2(); kfree(tfm); } @@ -535,7 +536,6 @@ static void aead_sock_destruct(struct sock *sk) unsigned int ivlen = crypto_aead_ivsize(tfm); af_alg_pull_tsgl(sk, ctx->used, NULL, 0); - crypto_put_default_null_skcipher2(); sock_kzfree_s(sk, ctx->iv, ivlen); sock_kfree_s(sk, ctx, ctx->len); af_alg_release_parent(sk); From 80dbdc5ae74d6167edf081f8865df06fb6615578 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sun, 26 Nov 2017 23:16:49 -0800 Subject: [PATCH 0379/1013] crypto: rsa - fix buffer overread when stripping leading zeroes commit d2890c3778b164fde587bc16583f3a1c87233ec5 upstream. In rsa_get_n(), if the buffer contained all 0's and "FIPS mode" is enabled, we would read one byte past the end of the buffer while scanning the leading zeroes. Fix it by checking 'n_sz' before '!*ptr'. This bug was reachable by adding a specially crafted key of type "asymmetric" (requires CONFIG_RSA and CONFIG_X509_CERTIFICATE_PARSER). KASAN report: BUG: KASAN: slab-out-of-bounds in rsa_get_n+0x19e/0x1d0 crypto/rsa_helper.c:33 Read of size 1 at addr ffff88003501a708 by task keyctl/196 CPU: 1 PID: 196 Comm: keyctl Not tainted 4.14.0-09238-g1d3b78bbc6e9 #26 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: rsa_get_n+0x19e/0x1d0 crypto/rsa_helper.c:33 asn1_ber_decoder+0x82a/0x1fd0 lib/asn1_decoder.c:328 rsa_set_pub_key+0xd3/0x320 crypto/rsa.c:278 crypto_akcipher_set_pub_key ./include/crypto/akcipher.h:364 [inline] pkcs1pad_set_pub_key+0xae/0x200 crypto/rsa-pkcs1pad.c:117 crypto_akcipher_set_pub_key ./include/crypto/akcipher.h:364 [inline] public_key_verify_signature+0x270/0x9d0 crypto/asymmetric_keys/public_key.c:106 x509_check_for_self_signed+0x2ea/0x480 crypto/asymmetric_keys/x509_public_key.c:141 x509_cert_parse+0x46a/0x620 crypto/asymmetric_keys/x509_cert_parser.c:129 x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174 asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388 key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0x96 Allocated by task 196: __do_kmalloc mm/slab.c:3711 [inline] __kmalloc_track_caller+0x118/0x2e0 mm/slab.c:3726 kmemdup+0x17/0x40 mm/util.c:118 kmemdup ./include/linux/string.h:414 [inline] x509_cert_parse+0x2cb/0x620 crypto/asymmetric_keys/x509_cert_parser.c:106 x509_key_preparse+0x61/0x750 crypto/asymmetric_keys/x509_public_key.c:174 asymmetric_key_preparse+0xa4/0x150 crypto/asymmetric_keys/asymmetric_type.c:388 key_create_or_update+0x4d4/0x10a0 security/keys/key.c:850 SYSC_add_key security/keys/keyctl.c:122 [inline] SyS_add_key+0xe8/0x290 security/keys/keyctl.c:62 entry_SYSCALL_64_fastpath+0x1f/0x96 Fixes: 5a7de97309f5 ("crypto: rsa - return raw integers for the ASN.1 parser") Cc: Tudor Ambarus Signed-off-by: Eric Biggers Reviewed-by: James Morris Reviewed-by: David Howells Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/rsa_helper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/crypto/rsa_helper.c b/crypto/rsa_helper.c index 0b66dc824606..cad395d70d78 100644 --- a/crypto/rsa_helper.c +++ b/crypto/rsa_helper.c @@ -30,7 +30,7 @@ int rsa_get_n(void *context, size_t hdrlen, unsigned char tag, return -EINVAL; if (fips_enabled) { - while (!*ptr && n_sz) { + while (n_sz && !*ptr) { ptr++; n_sz--; } From 902ae89f841de0c8d2857919296923f6332e174f Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 28 Nov 2017 18:01:38 -0800 Subject: [PATCH 0380/1013] crypto: hmac - require that the underlying hash algorithm is unkeyed commit af3ff8045bbf3e32f1a448542e73abb4c8ceb6f1 upstream. Because the HMAC template didn't check that its underlying hash algorithm is unkeyed, trying to use "hmac(hmac(sha3-512-generic))" through AF_ALG or through KEYCTL_DH_COMPUTE resulted in the inner HMAC being used without having been keyed, resulting in sha3_update() being called without sha3_init(), causing a stack buffer overflow. This is a very old bug, but it seems to have only started causing real problems when SHA-3 support was added (requires CONFIG_CRYPTO_SHA3) because the innermost hash's state is ->import()ed from a zeroed buffer, and it just so happens that other hash algorithms are fine with that, but SHA-3 is not. However, there could be arch or hardware-dependent hash algorithms also affected; I couldn't test everything. Fix the bug by introducing a function crypto_shash_alg_has_setkey() which tests whether a shash algorithm is keyed. Then update the HMAC template to require that its underlying hash algorithm is unkeyed. Here is a reproducer: #include #include int main() { int algfd; struct sockaddr_alg addr = { .salg_type = "hash", .salg_name = "hmac(hmac(sha3-512-generic))", }; char key[4096] = { 0 }; algfd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(algfd, (const struct sockaddr *)&addr, sizeof(addr)); setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key)); } Here was the KASAN report from syzbot: BUG: KASAN: stack-out-of-bounds in memcpy include/linux/string.h:341 [inline] BUG: KASAN: stack-out-of-bounds in sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161 Write of size 4096 at addr ffff8801cca07c40 by task syzkaller076574/3044 CPU: 1 PID: 3044 Comm: syzkaller076574 Not tainted 4.14.0-mm1+ #25 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x137/0x190 mm/kasan/kasan.c:267 memcpy+0x37/0x50 mm/kasan/kasan.c:303 memcpy include/linux/string.h:341 [inline] sha3_update+0xdf/0x2e0 crypto/sha3_generic.c:161 crypto_shash_update+0xcb/0x220 crypto/shash.c:109 shash_finup_unaligned+0x2a/0x60 crypto/shash.c:151 crypto_shash_finup+0xc4/0x120 crypto/shash.c:165 hmac_finup+0x182/0x330 crypto/hmac.c:152 crypto_shash_finup+0xc4/0x120 crypto/shash.c:165 shash_digest_unaligned+0x9e/0xd0 crypto/shash.c:172 crypto_shash_digest+0xc4/0x120 crypto/shash.c:186 hmac_setkey+0x36a/0x690 crypto/hmac.c:66 crypto_shash_setkey+0xad/0x190 crypto/shash.c:64 shash_async_setkey+0x47/0x60 crypto/shash.c:207 crypto_ahash_setkey+0xaf/0x180 crypto/ahash.c:200 hash_setkey+0x40/0x90 crypto/algif_hash.c:446 alg_setkey crypto/af_alg.c:221 [inline] alg_setsockopt+0x2a1/0x350 crypto/af_alg.c:254 SYSC_setsockopt net/socket.c:1851 [inline] SyS_setsockopt+0x189/0x360 net/socket.c:1830 entry_SYSCALL_64_fastpath+0x1f/0x96 Reported-by: syzbot Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/hmac.c | 6 +++++- crypto/shash.c | 5 +++-- include/crypto/internal/hash.h | 8 ++++++++ 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/crypto/hmac.c b/crypto/hmac.c index 92871dc2a63e..e74730224f0a 100644 --- a/crypto/hmac.c +++ b/crypto/hmac.c @@ -195,11 +195,15 @@ static int hmac_create(struct crypto_template *tmpl, struct rtattr **tb) salg = shash_attr_alg(tb[1], 0, 0); if (IS_ERR(salg)) return PTR_ERR(salg); + alg = &salg->base; + /* The underlying hash algorithm must be unkeyed */ err = -EINVAL; + if (crypto_shash_alg_has_setkey(salg)) + goto out_put_alg; + ds = salg->digestsize; ss = salg->statesize; - alg = &salg->base; if (ds > alg->cra_blocksize || ss < alg->cra_blocksize) goto out_put_alg; diff --git a/crypto/shash.c b/crypto/shash.c index 325a14da5827..e849d3ee2e27 100644 --- a/crypto/shash.c +++ b/crypto/shash.c @@ -25,11 +25,12 @@ static const struct crypto_type crypto_shash_type; -static int shash_no_setkey(struct crypto_shash *tfm, const u8 *key, - unsigned int keylen) +int shash_no_setkey(struct crypto_shash *tfm, const u8 *key, + unsigned int keylen) { return -ENOSYS; } +EXPORT_SYMBOL_GPL(shash_no_setkey); static int shash_setkey_unaligned(struct crypto_shash *tfm, const u8 *key, unsigned int keylen) diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h index f0b44c16e88f..c2bae8da642c 100644 --- a/include/crypto/internal/hash.h +++ b/include/crypto/internal/hash.h @@ -82,6 +82,14 @@ int ahash_register_instance(struct crypto_template *tmpl, struct ahash_instance *inst); void ahash_free_instance(struct crypto_instance *inst); +int shash_no_setkey(struct crypto_shash *tfm, const u8 *key, + unsigned int keylen); + +static inline bool crypto_shash_alg_has_setkey(struct shash_alg *alg) +{ + return alg->setkey != shash_no_setkey; +} + int crypto_init_ahash_spawn(struct crypto_ahash_spawn *spawn, struct hash_alg_common *alg, struct crypto_instance *inst); From c68b31521d5fb7216cb1113130399afe65437c6c Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 28 Nov 2017 20:56:59 -0800 Subject: [PATCH 0381/1013] crypto: salsa20 - fix blkcipher_walk API usage commit ecaaab5649781c5a0effdaf298a925063020500e upstream. When asked to encrypt or decrypt 0 bytes, both the generic and x86 implementations of Salsa20 crash in blkcipher_walk_done(), either when doing 'kfree(walk->buffer)' or 'free_page((unsigned long)walk->page)', because walk->buffer and walk->page have not been initialized. The bug is that Salsa20 is calling blkcipher_walk_done() even when nothing is in 'walk.nbytes'. But blkcipher_walk_done() is only meant to be called when a nonzero number of bytes have been provided. The broken code is part of an optimization that tries to make only one call to salsa20_encrypt_bytes() to process inputs that are not evenly divisible by 64 bytes. To fix the bug, just remove this "optimization" and use the blkcipher_walk API the same way all the other users do. Reproducer: #include #include #include int main() { int algfd, reqfd; struct sockaddr_alg addr = { .salg_type = "skcipher", .salg_name = "salsa20", }; char key[16] = { 0 }; algfd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(algfd, (void *)&addr, sizeof(addr)); reqfd = accept(algfd, 0, 0); setsockopt(algfd, SOL_ALG, ALG_SET_KEY, key, sizeof(key)); read(reqfd, key, sizeof(key)); } Reported-by: syzbot Fixes: eb6f13eb9f81 ("[CRYPTO] salsa20_generic: Fix multi-page processing") Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- arch/x86/crypto/salsa20_glue.c | 7 ------- crypto/salsa20_generic.c | 7 ------- 2 files changed, 14 deletions(-) diff --git a/arch/x86/crypto/salsa20_glue.c b/arch/x86/crypto/salsa20_glue.c index 399a29d067d6..cb91a64a99e7 100644 --- a/arch/x86/crypto/salsa20_glue.c +++ b/arch/x86/crypto/salsa20_glue.c @@ -59,13 +59,6 @@ static int encrypt(struct blkcipher_desc *desc, salsa20_ivsetup(ctx, walk.iv); - if (likely(walk.nbytes == nbytes)) - { - salsa20_encrypt_bytes(ctx, walk.src.virt.addr, - walk.dst.virt.addr, nbytes); - return blkcipher_walk_done(desc, &walk, 0); - } - while (walk.nbytes >= 64) { salsa20_encrypt_bytes(ctx, walk.src.virt.addr, walk.dst.virt.addr, diff --git a/crypto/salsa20_generic.c b/crypto/salsa20_generic.c index f550b5d94630..d7da0eea5622 100644 --- a/crypto/salsa20_generic.c +++ b/crypto/salsa20_generic.c @@ -188,13 +188,6 @@ static int encrypt(struct blkcipher_desc *desc, salsa20_ivsetup(ctx, walk.iv); - if (likely(walk.nbytes == nbytes)) - { - salsa20_encrypt_bytes(ctx, walk.dst.virt.addr, - walk.src.virt.addr, nbytes); - return blkcipher_walk_done(desc, &walk, 0); - } - while (walk.nbytes >= 64) { salsa20_encrypt_bytes(ctx, walk.dst.virt.addr, walk.src.virt.addr, From cf1048e46d4f301261d946e73add464fb95fd5a4 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 28 Nov 2017 00:46:24 -0800 Subject: [PATCH 0382/1013] crypto: af_alg - fix NULL pointer dereference in commit 887207ed9e5812ed9239b6d07185a2d35dda91db upstream. af_alg_free_areq_sgls() If allocating the ->tsgl member of 'struct af_alg_async_req' failed, during cleanup we dereferenced the NULL ->tsgl pointer in af_alg_free_areq_sgls(), because ->tsgl_entries was nonzero. Fix it by only freeing the ->tsgl list if it is non-NULL. This affected both algif_skcipher and algif_aead. Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") Reported-by: syzbot Signed-off-by: Eric Biggers Reviewed-by: Stephan Mueller Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/af_alg.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index a72659f452a5..e181073ef64d 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -699,14 +699,15 @@ void af_alg_free_areq_sgls(struct af_alg_async_req *areq) } tsgl = areq->tsgl; - for_each_sg(tsgl, sg, areq->tsgl_entries, i) { - if (!sg_page(sg)) - continue; - put_page(sg_page(sg)); - } + if (tsgl) { + for_each_sg(tsgl, sg, areq->tsgl_entries, i) { + if (!sg_page(sg)) + continue; + put_page(sg_page(sg)); + } - if (areq->tsgl && areq->tsgl_entries) sock_kfree_s(sk, tsgl, areq->tsgl_entries * sizeof(*tsgl)); + } } EXPORT_SYMBOL_GPL(af_alg_free_areq_sgls); From 6f3fc5f95975e101b198ecf223ad6f9f8994b191 Mon Sep 17 00:00:00 2001 From: Ronnie Sahlberg Date: Tue, 21 Nov 2017 09:36:33 +1100 Subject: [PATCH 0383/1013] cifs: fix NULL deref in SMB2_read commit a821df3f1af72aa6a0d573eea94a7dd2613e9f4e upstream. Signed-off-by: Ronnie Sahlberg Reviewed-by: Pavel Shilovsky Signed-off-by: Steve French Signed-off-by: Greg Kroah-Hartman --- fs/cifs/smb2pdu.c | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c index 5331631386a2..01346b8b6edb 100644 --- a/fs/cifs/smb2pdu.c +++ b/fs/cifs/smb2pdu.c @@ -2678,27 +2678,27 @@ SMB2_read(const unsigned int xid, struct cifs_io_parms *io_parms, cifs_small_buf_release(req); rsp = (struct smb2_read_rsp *)rsp_iov.iov_base; - shdr = get_sync_hdr(rsp); - if (shdr->Status == STATUS_END_OF_FILE) { + if (rc) { + if (rc != -ENODATA) { + cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE); + cifs_dbg(VFS, "Send error in read = %d\n", rc); + } free_rsp_buf(resp_buftype, rsp_iov.iov_base); - return 0; + return rc == -ENODATA ? 0 : rc; } - if (rc) { - cifs_stats_fail_inc(io_parms->tcon, SMB2_READ_HE); - cifs_dbg(VFS, "Send error in read = %d\n", rc); - } else { - *nbytes = le32_to_cpu(rsp->DataLength); - if ((*nbytes > CIFS_MAX_MSGSIZE) || - (*nbytes > io_parms->length)) { - cifs_dbg(FYI, "bad length %d for count %d\n", - *nbytes, io_parms->length); - rc = -EIO; - *nbytes = 0; - } + *nbytes = le32_to_cpu(rsp->DataLength); + if ((*nbytes > CIFS_MAX_MSGSIZE) || + (*nbytes > io_parms->length)) { + cifs_dbg(FYI, "bad length %d for count %d\n", + *nbytes, io_parms->length); + rc = -EIO; + *nbytes = 0; } + shdr = get_sync_hdr(rsp); + if (*buf) { memcpy(*buf, (char *)shdr + rsp->DataOffset, *nbytes); free_rsp_buf(resp_buftype, rsp_iov.iov_base); From c8def7a4185b8b1ec2b843af346977f00259414e Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 14 Dec 2017 15:32:34 -0800 Subject: [PATCH 0384/1013] string.h: workaround for increased stack usage commit 146734b091430c80d80bb96b1139a96fb4bc830e upstream. The hardened strlen() function causes rather large stack usage in at least one file in the kernel, in particular when CONFIG_KASAN is enabled: drivers/media/usb/em28xx/em28xx-dvb.c: In function 'em28xx_dvb_init': drivers/media/usb/em28xx/em28xx-dvb.c:2062:1: error: the frame size of 3256 bytes is larger than 204 bytes [-Werror=frame-larger-than=] Analyzing this problem led to the discovery that gcc fails to merge the stack slots for the i2c_board_info[] structures after we strlcpy() into them, due to the 'noreturn' attribute on the source string length check. I reported this as a gcc bug, but it is unlikely to get fixed for gcc-8, since it is relatively easy to work around, and it gets triggered rarely. An earlier workaround I did added an empty inline assembly statement before the call to fortify_panic(), which works surprisingly well, but is really ugly and unintuitive. This is a new approach to the same problem, this time addressing it by not calling the 'extern __real_strnlen()' function for string constants where __builtin_strlen() is a compile-time constant and therefore known to be safe. We do this by checking if the last character in the string is a compile-time constant '\0'. If it is, we can assume that strlen() of the string is also constant. As a side-effect, this should also improve the object code output for any other call of strlen() on a string constant. [akpm@linux-foundation.org: add comment] Link: http://lkml.kernel.org/r/20171205215143.3085755-1-arnd@arndb.de Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365 Link: https://patchwork.kernel.org/patch/9980413/ Link: https://patchwork.kernel.org/patch/9974047/ Fixes: 6974f0c4555 ("include/linux/string.h: add the option of fortified string.h functions") Signed-off-by: Arnd Bergmann Cc: Kees Cook Cc: Mauro Carvalho Chehab Cc: Dmitry Vyukov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Daniel Micay Cc: Greg Kroah-Hartman Cc: Martin Wilck Cc: Dan Williams Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/string.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/string.h b/include/linux/string.h index 410ecf17de3c..cfd83eb2f926 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -259,7 +259,10 @@ __FORTIFY_INLINE __kernel_size_t strlen(const char *p) { __kernel_size_t ret; size_t p_size = __builtin_object_size(p, 0); - if (p_size == (size_t)-1) + + /* Work around gcc excess stack consumption issue */ + if (p_size == (size_t)-1 || + (__builtin_constant_p(p[p_size - 1]) && p[p_size - 1] == '\0')) return __builtin_strlen(p); ret = strnlen(p, p_size); if (p_size <= ret) From d3d2f01a6eaf6022167c94afba34b6d6722392c9 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Thu, 14 Dec 2017 15:32:38 -0800 Subject: [PATCH 0385/1013] autofs: fix careless error in recent commit commit 302ec300ef8a545a7fc7f667e5fd743b091c2eeb upstream. Commit ecc0c469f277 ("autofs: don't fail mount for transient error") was meant to replace an 'if' with a 'switch', but instead added the 'switch' leaving the case in place. Link: http://lkml.kernel.org/r/87zi6wstmw.fsf@notabene.neil.brown.name Fixes: ecc0c469f277 ("autofs: don't fail mount for transient error") Reported-by: Ben Hutchings Signed-off-by: NeilBrown Cc: Ian Kent Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/autofs4/waitq.c | 1 - 1 file changed, 1 deletion(-) diff --git a/fs/autofs4/waitq.c b/fs/autofs4/waitq.c index 8fc41705c7cd..961a12dc6dc8 100644 --- a/fs/autofs4/waitq.c +++ b/fs/autofs4/waitq.c @@ -170,7 +170,6 @@ static void autofs4_notify_daemon(struct autofs_sb_info *sbi, mutex_unlock(&sbi->wq_mutex); - if (autofs4_write(sbi, pipe, &pkt, pktsz)) switch (ret = autofs4_write(sbi, pipe, &pkt, pktsz)) { case 0: break; From c5d9b78d530a45f0c226317764a7865f7911ab46 Mon Sep 17 00:00:00 2001 From: Thiago Rafael Becker Date: Thu, 14 Dec 2017 15:33:12 -0800 Subject: [PATCH 0386/1013] kernel: make groups_sort calling a responsibility group_info allocators commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream. In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com Signed-off-by: Thiago Rafael Becker Reviewed-by: Matthew Wilcox Reviewed-by: NeilBrown Acked-by: "J. Bruce Fields" Cc: Al Viro Cc: Martin Schwidefsky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/s390/kernel/compat_linux.c | 1 + fs/nfsd/auth.c | 3 +++ include/linux/cred.h | 1 + kernel/groups.c | 5 +++-- kernel/uid16.c | 1 + net/sunrpc/auth_gss/gss_rpc_xdr.c | 1 + net/sunrpc/auth_gss/svcauth_gss.c | 1 + net/sunrpc/svcauth_unix.c | 2 ++ 8 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/s390/kernel/compat_linux.c b/arch/s390/kernel/compat_linux.c index f04db3779b34..59eea9c65d3e 100644 --- a/arch/s390/kernel/compat_linux.c +++ b/arch/s390/kernel/compat_linux.c @@ -263,6 +263,7 @@ COMPAT_SYSCALL_DEFINE2(s390_setgroups16, int, gidsetsize, u16 __user *, grouplis return retval; } + groups_sort(group_info); retval = set_current_groups(group_info); put_group_info(group_info); diff --git a/fs/nfsd/auth.c b/fs/nfsd/auth.c index 697f8ae7792d..f650e475d8f0 100644 --- a/fs/nfsd/auth.c +++ b/fs/nfsd/auth.c @@ -60,6 +60,9 @@ int nfsd_setuser(struct svc_rqst *rqstp, struct svc_export *exp) gi->gid[i] = exp->ex_anon_gid; else gi->gid[i] = rqgi->gid[i]; + + /* Each thread allocates its own gi, no race */ + groups_sort(gi); } } else { gi = get_group_info(rqgi); diff --git a/include/linux/cred.h b/include/linux/cred.h index 099058e1178b..631286535d0f 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -83,6 +83,7 @@ extern int set_current_groups(struct group_info *); extern void set_groups(struct cred *, struct group_info *); extern int groups_search(const struct group_info *, kgid_t); extern bool may_setgroups(void); +extern void groups_sort(struct group_info *); /* * The security context of a task diff --git a/kernel/groups.c b/kernel/groups.c index e357bc800111..daae2f2dc6d4 100644 --- a/kernel/groups.c +++ b/kernel/groups.c @@ -86,11 +86,12 @@ static int gid_cmp(const void *_a, const void *_b) return gid_gt(a, b) - gid_lt(a, b); } -static void groups_sort(struct group_info *group_info) +void groups_sort(struct group_info *group_info) { sort(group_info->gid, group_info->ngroups, sizeof(*group_info->gid), gid_cmp, NULL); } +EXPORT_SYMBOL(groups_sort); /* a simple bsearch */ int groups_search(const struct group_info *group_info, kgid_t grp) @@ -122,7 +123,6 @@ int groups_search(const struct group_info *group_info, kgid_t grp) void set_groups(struct cred *new, struct group_info *group_info) { put_group_info(new->group_info); - groups_sort(group_info); get_group_info(group_info); new->group_info = group_info; } @@ -206,6 +206,7 @@ SYSCALL_DEFINE2(setgroups, int, gidsetsize, gid_t __user *, grouplist) return retval; } + groups_sort(group_info); retval = set_current_groups(group_info); put_group_info(group_info); diff --git a/kernel/uid16.c b/kernel/uid16.c index ce74a4901d2b..ef1da2a5f9bd 100644 --- a/kernel/uid16.c +++ b/kernel/uid16.c @@ -192,6 +192,7 @@ SYSCALL_DEFINE2(setgroups16, int, gidsetsize, old_gid_t __user *, grouplist) return retval; } + groups_sort(group_info); retval = set_current_groups(group_info); put_group_info(group_info); diff --git a/net/sunrpc/auth_gss/gss_rpc_xdr.c b/net/sunrpc/auth_gss/gss_rpc_xdr.c index c4778cae58ef..444380f968f1 100644 --- a/net/sunrpc/auth_gss/gss_rpc_xdr.c +++ b/net/sunrpc/auth_gss/gss_rpc_xdr.c @@ -231,6 +231,7 @@ static int gssx_dec_linux_creds(struct xdr_stream *xdr, goto out_free_groups; creds->cr_group_info->gid[i] = kgid; } + groups_sort(creds->cr_group_info); return 0; out_free_groups: diff --git a/net/sunrpc/auth_gss/svcauth_gss.c b/net/sunrpc/auth_gss/svcauth_gss.c index 7b1ee5a0b03c..f41ffb22652c 100644 --- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -481,6 +481,7 @@ static int rsc_parse(struct cache_detail *cd, goto out; rsci.cred.cr_group_info->gid[i] = kgid; } + groups_sort(rsci.cred.cr_group_info); /* mech name */ len = qword_get(&mesg, buf, mlen); diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index f81eaa8e0888..acb70d235e47 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -520,6 +520,7 @@ static int unix_gid_parse(struct cache_detail *cd, ug.gi->gid[i] = kgid; } + groups_sort(ug.gi); ugp = unix_gid_lookup(cd, uid); if (ugp) { struct cache_head *ch; @@ -819,6 +820,7 @@ svcauth_unix_accept(struct svc_rqst *rqstp, __be32 *authp) kgid_t kgid = make_kgid(&init_user_ns, svc_getnl(argv)); cred->cr_group_info->gid[i] = kgid; } + groups_sort(cred->cr_group_info); if (svc_getu32(argv) != htonl(RPC_AUTH_NULL) || svc_getu32(argv) != 0) { *authp = rpc_autherr_badverf; return SVC_DENIED; From 55fe4698d80ecb553d015b5df96ccf9584cbc06b Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Thu, 14 Dec 2017 15:33:15 -0800 Subject: [PATCH 0387/1013] mm, oom_reaper: fix memory corruption commit 4837fe37adff1d159904f0c013471b1ecbcb455e upstream. David Rientjes has reported the following memory corruption while the oom reaper tries to unmap the victims address space BUG: Bad page map in process oom_reaper pte:6353826300000000 pmd:00000000 addr:00007f50cab1d000 vm_flags:08100073 anon_vma:ffff9eea335603f0 mapping: (null) index:7f50cab1d file: (null) fault: (null) mmap: (null) readpage: (null) CPU: 2 PID: 1001 Comm: oom_reaper Call Trace: unmap_page_range+0x1068/0x1130 __oom_reap_task_mm+0xd5/0x16b oom_reaper+0xff/0x14c kthread+0xc1/0xe0 Tetsuo Handa has noticed that the synchronization inside exit_mmap is insufficient. We only synchronize with the oom reaper if tsk_is_oom_victim which is not true if the final __mmput is called from a different context than the oom victim exit path. This can trivially happen from context of any task which has grabbed mm reference (e.g. to read /proc// file which requires mm etc.). The race would look like this oom_reaper oom_victim task mmget_not_zero do_exit mmput __oom_reap_task_mm mmput __mmput exit_mmap remove_vma unmap_page_range Fix this issue by providing a new mm_is_oom_victim() helper which operates on the mm struct rather than a task. Any context which operates on a remote mm struct should use this helper in place of tsk_is_oom_victim. The flag is set in mark_oom_victim and never cleared so it is stable in the exit_mmap path. Debugged by Tetsuo Handa. Link: http://lkml.kernel.org/r/20171210095130.17110-1-mhocko@kernel.org Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Michal Hocko Reported-by: David Rientjes Acked-by: David Rientjes Cc: Tetsuo Handa Cc: Andrea Argangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- include/linux/oom.h | 9 +++++++++ include/linux/sched/coredump.h | 1 + mm/mmap.c | 10 +++++----- mm/oom_kill.c | 4 +++- 4 files changed, 18 insertions(+), 6 deletions(-) diff --git a/include/linux/oom.h b/include/linux/oom.h index 01c91d874a57..5bad038ac012 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -66,6 +66,15 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk) return tsk->signal->oom_mm; } +/* + * Use this helper if tsk->mm != mm and the victim mm needs a special + * handling. This is guaranteed to stay true after once set. + */ +static inline bool mm_is_oom_victim(struct mm_struct *mm) +{ + return test_bit(MMF_OOM_VICTIM, &mm->flags); +} + /* * Checks whether a page fault on the given mm is still reliable. * This is no longer true if the oom reaper started to reap the diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 9c8847395b5e..ec912d01126f 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -70,6 +70,7 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */ #define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */ #define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ +#define MMF_OOM_VICTIM 25 /* mm is the oom victim */ #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ diff --git a/mm/mmap.c b/mm/mmap.c index 476e810cf100..0de87a376aaa 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -3004,20 +3004,20 @@ void exit_mmap(struct mm_struct *mm) /* Use -1 here to ensure all VMAs in the mm are unmapped */ unmap_vmas(&tlb, vma, 0, -1); - set_bit(MMF_OOM_SKIP, &mm->flags); - if (unlikely(tsk_is_oom_victim(current))) { + if (unlikely(mm_is_oom_victim(mm))) { /* * Wait for oom_reap_task() to stop working on this * mm. Because MMF_OOM_SKIP is already set before * calling down_read(), oom_reap_task() will not run * on this "mm" post up_write(). * - * tsk_is_oom_victim() cannot be set from under us - * either because current->mm is already set to NULL + * mm_is_oom_victim() cannot be set from under us + * either because victim->mm is already set to NULL * under task_lock before calling mmput and oom_mm is - * set not NULL by the OOM killer only if current->mm + * set not NULL by the OOM killer only if victim->mm * is found not NULL while holding the task_lock. */ + set_bit(MMF_OOM_SKIP, &mm->flags); down_write(&mm->mmap_sem); up_write(&mm->mmap_sem); } diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 18c5b356b505..10aed8d8c080 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -673,8 +673,10 @@ static void mark_oom_victim(struct task_struct *tsk) return; /* oom_mm is bound to the signal struct life time. */ - if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) + if (!cmpxchg(&tsk->signal->oom_mm, NULL, mm)) { mmgrab(tsk->signal->oom_mm); + set_bit(MMF_OOM_VICTIM, &mm->flags); + } /* * Make sure that the task is woken up from uninterruptible sleep From b8582c0f792fc1603a8998de60a156359f15dfd4 Mon Sep 17 00:00:00 2001 From: Changbin Du Date: Thu, 30 Nov 2017 11:39:43 +0800 Subject: [PATCH 0388/1013] tracing: Allocate mask_str buffer dynamically commit 90e406f96f630c07d631a021fd4af10aac913e77 upstream. The default NR_CPUS can be very large, but actual possible nr_cpu_ids usually is very small. For my x86 distribution, the NR_CPUS is 8192 and nr_cpu_ids is 4. About 2 pages are wasted. Most machines don't have so many CPUs, so define a array with NR_CPUS just wastes memory. So let's allocate the buffer dynamically when need. With this change, the mutext tracing_cpumask_update_lock also can be removed now, which was used to protect mask_str. Link: http://lkml.kernel.org/r/1512013183-19107-1-git-send-email-changbin.du@intel.com Fixes: 36dfe9252bd4c ("ftrace: make use of tracing_cpumask") Signed-off-by: Changbin Du Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace.c | 29 +++++++++-------------------- 1 file changed, 9 insertions(+), 20 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 752e5daf0896..80de14973b42 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4178,37 +4178,30 @@ static const struct file_operations show_traces_fops = { .llseek = seq_lseek, }; -/* - * The tracer itself will not take this lock, but still we want - * to provide a consistent cpumask to user-space: - */ -static DEFINE_MUTEX(tracing_cpumask_update_lock); - -/* - * Temporary storage for the character representation of the - * CPU bitmask (and one more byte for the newline): - */ -static char mask_str[NR_CPUS + 1]; - static ssize_t tracing_cpumask_read(struct file *filp, char __user *ubuf, size_t count, loff_t *ppos) { struct trace_array *tr = file_inode(filp)->i_private; + char *mask_str; int len; - mutex_lock(&tracing_cpumask_update_lock); + len = snprintf(NULL, 0, "%*pb\n", + cpumask_pr_args(tr->tracing_cpumask)) + 1; + mask_str = kmalloc(len, GFP_KERNEL); + if (!mask_str) + return -ENOMEM; - len = snprintf(mask_str, count, "%*pb\n", + len = snprintf(mask_str, len, "%*pb\n", cpumask_pr_args(tr->tracing_cpumask)); if (len >= count) { count = -EINVAL; goto out_err; } - count = simple_read_from_buffer(ubuf, count, ppos, mask_str, NR_CPUS+1); + count = simple_read_from_buffer(ubuf, count, ppos, mask_str, len); out_err: - mutex_unlock(&tracing_cpumask_update_lock); + kfree(mask_str); return count; } @@ -4228,8 +4221,6 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, if (err) goto err_unlock; - mutex_lock(&tracing_cpumask_update_lock); - local_irq_disable(); arch_spin_lock(&tr->max_lock); for_each_tracing_cpu(cpu) { @@ -4252,8 +4243,6 @@ tracing_cpumask_write(struct file *filp, const char __user *ubuf, local_irq_enable(); cpumask_copy(tr->tracing_cpumask, tracing_cpumask_new); - - mutex_unlock(&tracing_cpumask_update_lock); free_cpumask_var(tracing_cpumask_new); return count; From 6391294874d505a5e8681c41439a74e7019b267f Mon Sep 17 00:00:00 2001 From: David Kozub Date: Tue, 5 Dec 2017 22:40:04 +0100 Subject: [PATCH 0389/1013] USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID commit 62354454625741f0569c2cbe45b2d192f8fd258e upstream. There is another JMS567-based USB3 UAS enclosure (152d:0578) that fails with the following error: [sda] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [sda] tag#0 Sense Key : Illegal Request [current] [sda] tag#0 Add. Sense: Invalid field in cdb The issue occurs both with UAS (occasionally) and mass storage (immediately after mounting a FS on a disk in the enclosure). Enabling US_FL_BROKEN_FUA quirk solves this issue. This patch adds an UNUSUAL_DEV with US_FL_BROKEN_FUA for the enclosure for both UAS and mass storage. Signed-off-by: David Kozub Acked-by: Alan Stern Signed-off-by: Greg Kroah-Hartman --- drivers/usb/storage/unusual_devs.h | 7 +++++++ drivers/usb/storage/unusual_uas.h | 7 +++++++ 2 files changed, 14 insertions(+) diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h index eb06d88b41d6..9af39644561f 100644 --- a/drivers/usb/storage/unusual_devs.h +++ b/drivers/usb/storage/unusual_devs.h @@ -2113,6 +2113,13 @@ UNUSUAL_DEV( 0x152d, 0x0567, 0x0114, 0x0116, USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_BROKEN_FUA ), +/* Reported by David Kozub */ +UNUSUAL_DEV(0x152d, 0x0578, 0x0000, 0x9999, + "JMicron", + "JMS567", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_BROKEN_FUA), + /* * Reported by Alexandre Oliva * JMicron responds to USN and several other SCSI ioctls with a diff --git a/drivers/usb/storage/unusual_uas.h b/drivers/usb/storage/unusual_uas.h index cde115359793..9f356f7cf7d5 100644 --- a/drivers/usb/storage/unusual_uas.h +++ b/drivers/usb/storage/unusual_uas.h @@ -142,6 +142,13 @@ UNUSUAL_DEV(0x152d, 0x0567, 0x0000, 0x9999, USB_SC_DEVICE, USB_PR_DEVICE, NULL, US_FL_BROKEN_FUA | US_FL_NO_REPORT_OPCODES), +/* Reported-by: David Kozub */ +UNUSUAL_DEV(0x152d, 0x0578, 0x0000, 0x9999, + "JMicron", + "JMS567", + USB_SC_DEVICE, USB_PR_DEVICE, NULL, + US_FL_BROKEN_FUA), + /* Reported-by: Hans de Goede */ UNUSUAL_DEV(0x2109, 0x0711, 0x0000, 0x9999, "VIA", From 4c5ae6a301a5415d1334f6c655bebf91d475bd89 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Tue, 12 Dec 2017 14:25:13 -0500 Subject: [PATCH 0390/1013] USB: core: prevent malicious bNumInterfaces overflow commit 48a4ff1c7bb5a32d2e396b03132d20d552c0eca7 upstream. A malicious USB device with crafted descriptors can cause the kernel to access unallocated memory by setting the bNumInterfaces value too high in a configuration descriptor. Although the value is adjusted during parsing, this adjustment is skipped in one of the error return paths. This patch prevents the problem by setting bNumInterfaces to 0 initially. The existing code already sets it to the proper value after parsing is complete. Signed-off-by: Alan Stern Reported-by: Andrey Konovalov Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/config.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index c42a3e63eb07..843ef46d2537 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -555,6 +555,9 @@ static int usb_parse_configuration(struct usb_device *dev, int cfgidx, unsigned iad_num = 0; memcpy(&config->desc, buffer, USB_DT_CONFIG_SIZE); + nintf = nintf_orig = config->desc.bNumInterfaces; + config->desc.bNumInterfaces = 0; // Adjusted later + if (config->desc.bDescriptorType != USB_DT_CONFIG || config->desc.bLength < USB_DT_CONFIG_SIZE || config->desc.bLength > size) { @@ -568,7 +571,6 @@ static int usb_parse_configuration(struct usb_device *dev, int cfgidx, buffer += config->desc.bLength; size -= config->desc.bLength; - nintf = nintf_orig = config->desc.bNumInterfaces; if (nintf > USB_MAXINTERFACES) { dev_warn(ddev, "config %d has too many interfaces: %d, " "using maximum allowed: %d\n", From 066f40dc495d5ab2ec9a871cb8703d9591d636c5 Mon Sep 17 00:00:00 2001 From: Vivek Goyal Date: Mon, 27 Nov 2017 10:12:44 -0500 Subject: [PATCH 0391/1013] ovl: Pass ovl_get_nlink() parameters in right order commit 08d8f8a5b094b66b29936e8751b4a818b8db1207 upstream. Right now we seem to be passing index as "lowerdentry" and origin.dentry as "upperdentry". IIUC, we should pass these parameters in reversed order and this looks like a bug. Signed-off-by: Vivek Goyal Acked-by: Amir Goldstein Fixes: caf70cb2ba5d ("ovl: cleanup orphan index entries") Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/overlayfs/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/overlayfs/namei.c b/fs/overlayfs/namei.c index bc6d5c5a3443..4bb7e4f53ea6 100644 --- a/fs/overlayfs/namei.c +++ b/fs/overlayfs/namei.c @@ -437,7 +437,7 @@ int ovl_verify_index(struct dentry *index, struct path *lowerstack, /* Check if index is orphan and don't warn before cleaning it */ if (d_inode(index)->i_nlink == 1 && - ovl_get_nlink(index, origin.dentry, 0) == 0) + ovl_get_nlink(origin.dentry, index, 0) == 0) err = -ENOENT; dput(origin.dentry); From cb6ee0f3c19980328529d5ed7a1650e41164ca49 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 29 Nov 2017 07:35:21 +0200 Subject: [PATCH 0392/1013] ovl: update ctx->pos on impure dir iteration commit b02a16e6413a2f782e542ef60bad9ff6bf212f8a upstream. This fixes a regression with readdir of impure dir in overlayfs that is shared to VM via 9p fs. Reported-by: Miguel Bernal Marin Fixes: 4edb83bb1041 ("ovl: constant d_ino for non-merge dirs") Signed-off-by: Amir Goldstein Tested-by: Miguel Bernal Marin Signed-off-by: Miklos Szeredi Signed-off-by: Greg Kroah-Hartman --- fs/overlayfs/readdir.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index 698b74dd750e..b2c7f33e08fc 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -645,7 +645,10 @@ static int ovl_iterate_real(struct file *file, struct dir_context *ctx) return PTR_ERR(rdt.cache); } - return iterate_dir(od->realfile, &rdt.ctx); + err = iterate_dir(od->realfile, &rdt.ctx); + ctx->pos = rdt.ctx.pos; + + return err; } From 7120d742ad8d0f1fe37e4b73827e166fc1e01eea Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Thu, 7 Dec 2017 14:16:47 -0700 Subject: [PATCH 0393/1013] usbip: fix stub_rx: get_pipe() to validate endpoint number commit 635f545a7e8be7596b9b2b6a43cab6bbd5a88e43 upstream. get_pipe() routine doesn't validate the input endpoint number and uses to reference ep_in and ep_out arrays. Invalid endpoint number can trigger BUG(). Range check the epnum and returning error instead of calling BUG(). Change caller stub_recv_cmd_submit() to handle the get_pipe() error return. Reported-by: Secunia Research Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- drivers/usb/usbip/stub_rx.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/usb/usbip/stub_rx.c b/drivers/usb/usbip/stub_rx.c index 191b176ffedf..4f03c611c16d 100644 --- a/drivers/usb/usbip/stub_rx.c +++ b/drivers/usb/usbip/stub_rx.c @@ -342,15 +342,15 @@ static int get_pipe(struct stub_device *sdev, int epnum, int dir) struct usb_host_endpoint *ep; struct usb_endpoint_descriptor *epd = NULL; + if (epnum < 0 || epnum > 15) + goto err_ret; + if (dir == USBIP_DIR_IN) ep = udev->ep_in[epnum & 0x7f]; else ep = udev->ep_out[epnum & 0x7f]; - if (!ep) { - dev_err(&sdev->udev->dev, "no such endpoint?, %d\n", - epnum); - BUG(); - } + if (!ep) + goto err_ret; epd = &ep->desc; if (usb_endpoint_xfer_control(epd)) { @@ -381,9 +381,10 @@ static int get_pipe(struct stub_device *sdev, int epnum, int dir) return usb_rcvisocpipe(udev, epnum); } +err_ret: /* NOT REACHED */ - dev_err(&sdev->udev->dev, "get pipe, epnum %d\n", epnum); - return 0; + dev_err(&sdev->udev->dev, "get pipe() invalid epnum %d\n", epnum); + return -1; } static void masking_bogus_flags(struct urb *urb) @@ -449,6 +450,9 @@ static void stub_recv_cmd_submit(struct stub_device *sdev, struct usb_device *udev = sdev->udev; int pipe = get_pipe(sdev, pdu->base.ep, pdu->base.direction); + if (pipe == -1) + return; + priv = stub_priv_alloc(sdev, pdu); if (!priv) return; From 1621db059603e781f61a9bf33cba639b42faf0bc Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Thu, 7 Dec 2017 14:16:48 -0700 Subject: [PATCH 0394/1013] usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input commit c6688ef9f29762e65bce325ef4acd6c675806366 upstream. Harden CMD_SUBMIT path to handle malicious input that could trigger large memory allocations. Add checks to validate transfer_buffer_length and number_of_packets to protect against bad input requesting for unbounded memory allocations. Validate early in get_pipe() and return failure. Reported-by: Secunia Research Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- drivers/usb/usbip/stub_rx.c | 35 +++++++++++++++++++++++++++++++---- 1 file changed, 31 insertions(+), 4 deletions(-) diff --git a/drivers/usb/usbip/stub_rx.c b/drivers/usb/usbip/stub_rx.c index 4f03c611c16d..283a9be77a22 100644 --- a/drivers/usb/usbip/stub_rx.c +++ b/drivers/usb/usbip/stub_rx.c @@ -336,11 +336,13 @@ static struct stub_priv *stub_priv_alloc(struct stub_device *sdev, return priv; } -static int get_pipe(struct stub_device *sdev, int epnum, int dir) +static int get_pipe(struct stub_device *sdev, struct usbip_header *pdu) { struct usb_device *udev = sdev->udev; struct usb_host_endpoint *ep; struct usb_endpoint_descriptor *epd = NULL; + int epnum = pdu->base.ep; + int dir = pdu->base.direction; if (epnum < 0 || epnum > 15) goto err_ret; @@ -353,6 +355,15 @@ static int get_pipe(struct stub_device *sdev, int epnum, int dir) goto err_ret; epd = &ep->desc; + + /* validate transfer_buffer_length */ + if (pdu->u.cmd_submit.transfer_buffer_length > INT_MAX) { + dev_err(&sdev->udev->dev, + "CMD_SUBMIT: -EMSGSIZE transfer_buffer_length %d\n", + pdu->u.cmd_submit.transfer_buffer_length); + return -1; + } + if (usb_endpoint_xfer_control(epd)) { if (dir == USBIP_DIR_OUT) return usb_sndctrlpipe(udev, epnum); @@ -375,6 +386,21 @@ static int get_pipe(struct stub_device *sdev, int epnum, int dir) } if (usb_endpoint_xfer_isoc(epd)) { + /* validate packet size and number of packets */ + unsigned int maxp, packets, bytes; + + maxp = usb_endpoint_maxp(epd); + maxp *= usb_endpoint_maxp_mult(epd); + bytes = pdu->u.cmd_submit.transfer_buffer_length; + packets = DIV_ROUND_UP(bytes, maxp); + + if (pdu->u.cmd_submit.number_of_packets < 0 || + pdu->u.cmd_submit.number_of_packets > packets) { + dev_err(&sdev->udev->dev, + "CMD_SUBMIT: isoc invalid num packets %d\n", + pdu->u.cmd_submit.number_of_packets); + return -1; + } if (dir == USBIP_DIR_OUT) return usb_sndisocpipe(udev, epnum); else @@ -383,7 +409,7 @@ static int get_pipe(struct stub_device *sdev, int epnum, int dir) err_ret: /* NOT REACHED */ - dev_err(&sdev->udev->dev, "get pipe() invalid epnum %d\n", epnum); + dev_err(&sdev->udev->dev, "CMD_SUBMIT: invalid epnum %d\n", epnum); return -1; } @@ -448,7 +474,7 @@ static void stub_recv_cmd_submit(struct stub_device *sdev, struct stub_priv *priv; struct usbip_device *ud = &sdev->ud; struct usb_device *udev = sdev->udev; - int pipe = get_pipe(sdev, pdu->base.ep, pdu->base.direction); + int pipe = get_pipe(sdev, pdu); if (pipe == -1) return; @@ -470,7 +496,8 @@ static void stub_recv_cmd_submit(struct stub_device *sdev, } /* allocate urb transfer buffer, if needed */ - if (pdu->u.cmd_submit.transfer_buffer_length > 0) { + if (pdu->u.cmd_submit.transfer_buffer_length > 0 && + pdu->u.cmd_submit.transfer_buffer_length <= INT_MAX) { priv->urb->transfer_buffer = kzalloc(pdu->u.cmd_submit.transfer_buffer_length, GFP_KERNEL); From b6a2ad646c13bb9d1231bce5599cb3176ff33ca4 Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Thu, 7 Dec 2017 14:16:49 -0700 Subject: [PATCH 0395/1013] usbip: prevent vhci_hcd driver from leaking a socket pointer address commit 2f2d0088eb93db5c649d2a5e34a3800a8a935fc5 upstream. When a client has a USB device attached over IP, the vhci_hcd driver is locally leaking a socket pointer address via the /sys/devices/platform/vhci_hcd/status file (world-readable) and in debug output when "usbip --debug port" is run. Fix it to not leak. The socket pointer address is not used at the moment and it was made visible as a convenient way to find IP address from socket pointer address by looking up /proc/net/{tcp,tcp6}. As this opens a security hole, the fix replaces socket pointer address with sockfd. Reported-by: Secunia Research Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- drivers/usb/usbip/usbip_common.h | 1 + drivers/usb/usbip/vhci_sysfs.c | 25 ++++++++++++++++--------- tools/usb/usbip/libsrc/vhci_driver.c | 8 ++++---- 3 files changed, 21 insertions(+), 13 deletions(-) diff --git a/drivers/usb/usbip/usbip_common.h b/drivers/usb/usbip/usbip_common.h index 3050fc99a417..33737b612b1f 100644 --- a/drivers/usb/usbip/usbip_common.h +++ b/drivers/usb/usbip/usbip_common.h @@ -270,6 +270,7 @@ struct usbip_device { /* lock for status */ spinlock_t lock; + int sockfd; struct socket *tcp_socket; struct task_struct *tcp_rx; diff --git a/drivers/usb/usbip/vhci_sysfs.c b/drivers/usb/usbip/vhci_sysfs.c index 1b9f60a22e0b..84df63e3130d 100644 --- a/drivers/usb/usbip/vhci_sysfs.c +++ b/drivers/usb/usbip/vhci_sysfs.c @@ -31,15 +31,20 @@ /* * output example: - * hub port sta spd dev socket local_busid - * hs 0000 004 000 00000000 c5a7bb80 1-2.3 + * hub port sta spd dev sockfd local_busid + * hs 0000 004 000 00000000 3 1-2.3 * ................................................ - * ss 0008 004 000 00000000 d8cee980 2-3.4 + * ss 0008 004 000 00000000 4 2-3.4 * ................................................ * - * IP address can be retrieved from a socket pointer address by looking - * up /proc/net/{tcp,tcp6}. Also, a userland program may remember a - * port number and its peer IP address. + * Output includes socket fd instead of socket pointer address to avoid + * leaking kernel memory address in: + * /sys/devices/platform/vhci_hcd.0/status and in debug output. + * The socket pointer address is not used at the moment and it was made + * visible as a convenient way to find IP address from socket pointer + * address by looking up /proc/net/{tcp,tcp6}. As this opens a security + * hole, the change is made to use sockfd instead. + * */ static void port_show_vhci(char **out, int hub, int port, struct vhci_device *vdev) { @@ -53,8 +58,8 @@ static void port_show_vhci(char **out, int hub, int port, struct vhci_device *vd if (vdev->ud.status == VDEV_ST_USED) { *out += sprintf(*out, "%03u %08x ", vdev->speed, vdev->devid); - *out += sprintf(*out, "%16p %s", - vdev->ud.tcp_socket, + *out += sprintf(*out, "%u %s", + vdev->ud.sockfd, dev_name(&vdev->udev->dev)); } else { @@ -174,7 +179,8 @@ static ssize_t nports_show(struct device *dev, struct device_attribute *attr, char *s = out; /* - * Half the ports are for SPEED_HIGH and half for SPEED_SUPER, thus the * 2. + * Half the ports are for SPEED_HIGH and half for SPEED_SUPER, + * thus the * 2. */ out += sprintf(out, "%d\n", VHCI_PORTS * vhci_num_controllers); return out - s; @@ -380,6 +386,7 @@ static ssize_t store_attach(struct device *dev, struct device_attribute *attr, vdev->devid = devid; vdev->speed = speed; + vdev->ud.sockfd = sockfd; vdev->ud.tcp_socket = socket; vdev->ud.status = VDEV_ST_NOTASSIGNED; diff --git a/tools/usb/usbip/libsrc/vhci_driver.c b/tools/usb/usbip/libsrc/vhci_driver.c index 8a1cd1616de4..d1fc0f9f00fb 100644 --- a/tools/usb/usbip/libsrc/vhci_driver.c +++ b/tools/usb/usbip/libsrc/vhci_driver.c @@ -50,14 +50,14 @@ static int parse_status(const char *value) while (*c != '\0') { int port, status, speed, devid; - unsigned long socket; + int sockfd; char lbusid[SYSFS_BUS_ID_SIZE]; struct usbip_imported_device *idev; char hub[3]; - ret = sscanf(c, "%2s %d %d %d %x %lx %31s\n", + ret = sscanf(c, "%2s %d %d %d %x %u %31s\n", hub, &port, &status, &speed, - &devid, &socket, lbusid); + &devid, &sockfd, lbusid); if (ret < 5) { dbg("sscanf failed: %d", ret); @@ -66,7 +66,7 @@ static int parse_status(const char *value) dbg("hub %s port %d status %d speed %d devid %x", hub, port, status, speed, devid); - dbg("socket %lx lbusid %s", socket, lbusid); + dbg("sockfd %u lbusid %s", sockfd, lbusid); /* if a device is connected, look at it */ idev = &vhci_driver->idev[port]; From d78a5506cf0ea112124c1ffa5c0aae09b579d96d Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Thu, 7 Dec 2017 14:16:50 -0700 Subject: [PATCH 0396/1013] usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer commit be6123df1ea8f01ee2f896a16c2b7be3e4557a5a upstream. stub_send_ret_submit() handles urb with a potential null transfer_buffer, when it replays a packet with potential malicious data that could contain a null buffer. Add a check for the condition when actual_length > 0 and transfer_buffer is null. Reported-by: Secunia Research Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- drivers/usb/usbip/stub_tx.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/usb/usbip/stub_tx.c b/drivers/usb/usbip/stub_tx.c index be50cef645d8..87ff94be4235 100644 --- a/drivers/usb/usbip/stub_tx.c +++ b/drivers/usb/usbip/stub_tx.c @@ -181,6 +181,13 @@ static int stub_send_ret_submit(struct stub_device *sdev) memset(&pdu_header, 0, sizeof(pdu_header)); memset(&msg, 0, sizeof(msg)); + if (urb->actual_length > 0 && !urb->transfer_buffer) { + dev_err(&sdev->udev->dev, + "urb: actual_length %d transfer_buffer null\n", + urb->actual_length); + return -1; + } + if (usb_pipetype(urb->pipe) == PIPE_ISOCHRONOUS) iovnum = 2 + urb->number_of_packets; else From 318cea7d8d801cfa70f5b3a3a991425cd502b574 Mon Sep 17 00:00:00 2001 From: Christoph Fritz Date: Sat, 9 Dec 2017 23:47:55 +0100 Subject: [PATCH 0397/1013] mmc: core: apply NO_CMD23 quirk to some specific cards commit 91516a2a4734614d62ee3ed921f8f88acc67c000 upstream. To get an usdhc Apacer and some ATP SD cards work reliable, CMD23 needs to be disabled. This has been tested on i.MX6 (sdhci-esdhc) and rk3288 (dw_mmc-rockchip). Without this patch on i.MX6 (sdhci-esdhc): $ dd if=/dev/urandom of=/mnt/test bs=1M count=10 conv=fsync | | mmc0: starting CMD25 arg 00a71f00 flags 000000b5 | mmc0: blksz 512 blocks 1024 flags 00000100 tsac 3000 ms nsac 0 | mmc0: CMD12 arg 00000000 flags 0000049d | sdhci [sdhci_irq()]: *** mmc0 got interrupt: 0x00000001 | mmc0: Timeout waiting for hardware interrupt. Without this patch on rk3288 (dw_mmc-rockchip): | mmc1: Card stuck in programming state! mmcblk1 card_busy_detect | dwmmc_rockchip ff0c0000.dwmmc: Busy; trying anyway | mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, | actual 400000HZ div = 0) | mmc1: card never left busy state | mmc1: tried to reset card, got error -110 | blk_update_request: I/O error, dev mmcblk1, sector 139778 | Buffer I/O error on dev mmcblk1p1, logical block 131586, lost async | page write Signed-off-by: Christoph Fritz Signed-off-by: Ulf Hansson Signed-off-by: Greg Kroah-Hartman --- drivers/mmc/core/card.h | 2 ++ drivers/mmc/core/quirks.h | 8 ++++++++ 2 files changed, 10 insertions(+) diff --git a/drivers/mmc/core/card.h b/drivers/mmc/core/card.h index f06cd91964ce..79a5b985ccf5 100644 --- a/drivers/mmc/core/card.h +++ b/drivers/mmc/core/card.h @@ -75,9 +75,11 @@ struct mmc_fixup { #define EXT_CSD_REV_ANY (-1u) #define CID_MANFID_SANDISK 0x2 +#define CID_MANFID_ATP 0x9 #define CID_MANFID_TOSHIBA 0x11 #define CID_MANFID_MICRON 0x13 #define CID_MANFID_SAMSUNG 0x15 +#define CID_MANFID_APACER 0x27 #define CID_MANFID_KINGSTON 0x70 #define CID_MANFID_HYNIX 0x90 diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h index f664e9cbc9f8..75d317623852 100644 --- a/drivers/mmc/core/quirks.h +++ b/drivers/mmc/core/quirks.h @@ -52,6 +52,14 @@ static const struct mmc_fixup mmc_blk_fixups[] = { MMC_FIXUP("MMC32G", CID_MANFID_TOSHIBA, CID_OEMID_ANY, add_quirk_mmc, MMC_QUIRK_BLK_NO_CMD23), + /* + * Some SD cards lockup while using CMD23 multiblock transfers. + */ + MMC_FIXUP("AF SD", CID_MANFID_ATP, CID_OEMID_ANY, add_quirk_sd, + MMC_QUIRK_BLK_NO_CMD23), + MMC_FIXUP("APUSD", CID_MANFID_APACER, 0x5048, add_quirk_sd, + MMC_QUIRK_BLK_NO_CMD23), + /* * Some MMC cards need longer data read timeout than indicated in CSD. */ From eae219a2fc5554d86b9ac4d31bef73c273e926c0 Mon Sep 17 00:00:00 2001 From: "Yan, Zheng" Date: Thu, 30 Nov 2017 11:59:22 +0800 Subject: [PATCH 0398/1013] ceph: drop negative child dentries before try pruning inode's alias commit 040d786032bf59002d374b86d75b04d97624005c upstream. Negative child dentry holds reference on inode's alias, it makes d_prune_aliases() do nothing. Signed-off-by: "Yan, Zheng" Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov Signed-off-by: Greg Kroah-Hartman --- fs/ceph/mds_client.c | 42 ++++++++++++++++++++++++++++++++++++++---- 1 file changed, 38 insertions(+), 4 deletions(-) diff --git a/fs/ceph/mds_client.c b/fs/ceph/mds_client.c index 0687ab3c3267..bf378ddca4db 100644 --- a/fs/ceph/mds_client.c +++ b/fs/ceph/mds_client.c @@ -1428,6 +1428,29 @@ static int __close_session(struct ceph_mds_client *mdsc, return request_close_session(mdsc, session); } +static bool drop_negative_children(struct dentry *dentry) +{ + struct dentry *child; + bool all_negative = true; + + if (!d_is_dir(dentry)) + goto out; + + spin_lock(&dentry->d_lock); + list_for_each_entry(child, &dentry->d_subdirs, d_child) { + if (d_really_is_positive(child)) { + all_negative = false; + break; + } + } + spin_unlock(&dentry->d_lock); + + if (all_negative) + shrink_dcache_parent(dentry); +out: + return all_negative; +} + /* * Trim old(er) caps. * @@ -1473,16 +1496,27 @@ static int trim_caps_cb(struct inode *inode, struct ceph_cap *cap, void *arg) if ((used | wanted) & ~oissued & mine) goto out; /* we need these caps */ - session->s_trim_caps--; if (oissued) { /* we aren't the only cap.. just remove us */ __ceph_remove_cap(cap, true); + session->s_trim_caps--; } else { + struct dentry *dentry; /* try dropping referring dentries */ spin_unlock(&ci->i_ceph_lock); - d_prune_aliases(inode); - dout("trim_caps_cb %p cap %p pruned, count now %d\n", - inode, cap, atomic_read(&inode->i_count)); + dentry = d_find_any_alias(inode); + if (dentry && drop_negative_children(dentry)) { + int count; + dput(dentry); + d_prune_aliases(inode); + count = atomic_read(&inode->i_count); + if (count == 1) + session->s_trim_caps--; + dout("trim_caps_cb %p cap %p pruned, count now %d\n", + inode, cap, count); + } else { + dput(dentry); + } return 0; } From 912fe79116f2456a00cf4a9b2ad3ef207ba3e162 Mon Sep 17 00:00:00 2001 From: Chunfeng Yun Date: Fri, 8 Dec 2017 18:10:06 +0200 Subject: [PATCH 0399/1013] usb: xhci: fix TDS for MTK xHCI1.1 commit 72b663a99c074a8d073e7ecdae446cfb024ef551 upstream. For MTK's xHCI 1.0 or latter, TD size is the number of max packet sized packets remaining in the TD, not including this TRB (following spec). For MTK's xHCI 0.96 and older, TD size is the number of max packet sized packets remaining in the TD, including this TRB (not following spec). Signed-off-by: Chunfeng Yun Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-ring.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 353520005c13..6996235e34a9 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -3121,7 +3121,7 @@ static u32 xhci_td_remainder(struct xhci_hcd *xhci, int transferred, { u32 maxp, total_packet_count; - /* MTK xHCI is mostly 0.97 but contains some features from 1.0 */ + /* MTK xHCI 0.96 contains some features from 1.0 */ if (xhci->hci_version < 0x100 && !(xhci->quirks & XHCI_MTK_HOST)) return ((td_total_len - transferred) >> 10); @@ -3130,8 +3130,8 @@ static u32 xhci_td_remainder(struct xhci_hcd *xhci, int transferred, trb_buff_len == td_total_len) return 0; - /* for MTK xHCI, TD size doesn't include this TRB */ - if (xhci->quirks & XHCI_MTK_HOST) + /* for MTK xHCI 0.96, TD size include this TRB, but not in 1.x */ + if ((xhci->quirks & XHCI_MTK_HOST) && (xhci->hci_version < 0x100)) trb_buff_len = 0; maxp = usb_endpoint_maxp(&urb->ep->desc); From 2b3ff282dff3a42a081bef0f8249a99f64d7669f Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Fri, 8 Dec 2017 18:10:05 +0200 Subject: [PATCH 0400/1013] xhci: Don't add a virt_dev to the devs array before it's fully allocated commit 5d9b70f7d52eb14bb37861c663bae44de9521c35 upstream. Avoid null pointer dereference if some function is walking through the devs array accessing members of a new virt_dev that is mid allocation. Add the virt_dev to xhci->devs[i] _after_ the virt_device and all its members are properly allocated. issue found by KASAN: null-ptr-deref in xhci_find_slot_id_by_port "Quick analysis suggests that xhci_alloc_virt_device() is not mutex protected. If so, there is a time frame where xhci->devs[slot_id] is set but not fully initialized. Specifically, xhci->devs[i]->udev can be NULL." Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-mem.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c index 97f30eb7dac0..ccdc971283d0 100644 --- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -983,10 +983,9 @@ int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id, return 0; } - xhci->devs[slot_id] = kzalloc(sizeof(*xhci->devs[slot_id]), flags); - if (!xhci->devs[slot_id]) + dev = kzalloc(sizeof(*dev), flags); + if (!dev) return 0; - dev = xhci->devs[slot_id]; /* Allocate the (output) device context that will be used in the HC. */ dev->out_ctx = xhci_alloc_container_ctx(xhci, XHCI_CTX_TYPE_DEVICE, flags); @@ -1027,9 +1026,17 @@ int xhci_alloc_virt_device(struct xhci_hcd *xhci, int slot_id, trace_xhci_alloc_virt_device(dev); + xhci->devs[slot_id] = dev; + return 1; fail: - xhci_free_virt_device(xhci, slot_id); + + if (dev->in_ctx) + xhci_free_container_ctx(xhci, dev->in_ctx); + if (dev->out_ctx) + xhci_free_container_ctx(xhci, dev->out_ctx); + kfree(dev); + return 0; } From 4995bfa99096410af00aa0b1b374005494c709f3 Mon Sep 17 00:00:00 2001 From: Daniel Jurgens Date: Tue, 5 Dec 2017 22:30:01 +0200 Subject: [PATCH 0401/1013] IB/core: Bound check alternate path port number commit 4cae8ff136782d77b108cb3a5ba53e60597ba3a6 upstream. The alternate port number is used as an array index in the IB security implementation, invalid values can result in a kernel panic. Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Signed-off-by: Daniel Jurgens Reviewed-by: Parav Pandit Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/uverbs_cmd.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 52a2cf2d83aa..d8f540054392 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -1982,6 +1982,12 @@ static int modify_qp(struct ib_uverbs_file *file, goto release_qp; } + if ((cmd->base.attr_mask & IB_QP_ALT_PATH) && + !rdma_is_port_valid(qp->device, cmd->base.alt_port_num)) { + ret = -EINVAL; + goto release_qp; + } + attr->qp_state = cmd->base.qp_state; attr->cur_qp_state = cmd->base.cur_qp_state; attr->path_mtu = cmd->base.path_mtu; From 74b6274f8c5e81d7243ca2dea4f80c8585b3d0e0 Mon Sep 17 00:00:00 2001 From: Daniel Jurgens Date: Tue, 5 Dec 2017 22:30:02 +0200 Subject: [PATCH 0402/1013] IB/core: Don't enforce PKey security on SMI MADs commit 0fbe8f575b15585eec3326e43708fbbc024e8486 upstream. Per the infiniband spec an SMI MAD can have any PKey. Checking the pkey on SMI MADs is not necessary, and it seems that some older adapters using the mthca driver don't follow the convention of using the default PKey, resulting in false denials, or errors querying the PKey cache. SMI MAD security is still enforced, only agents allowed to manage the subnet are able to receive or send SMI MADs. Reported-by: Chris Blake Fixes: 47a2b338fe63 ("IB/core: Enforce security on management datagrams") Signed-off-by: Daniel Jurgens Reviewed-by: Parav Pandit Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/security.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c index a337386652b0..feafdb961c48 100644 --- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -739,8 +739,11 @@ int ib_mad_enforce_security(struct ib_mad_agent_private *map, u16 pkey_index) if (!rdma_protocol_ib(map->agent.device, map->agent.port_num)) return 0; - if (map->agent.qp->qp_type == IB_QPT_SMI && !map->agent.smp_allowed) - return -EACCES; + if (map->agent.qp->qp_type == IB_QPT_SMI) { + if (!map->agent.smp_allowed) + return -EACCES; + return 0; + } return ib_security_pkey_access(map->agent.device, map->agent.port_num, From 268b2cc325061c60fa2dccfa4d61495e9306c4fa Mon Sep 17 00:00:00 2001 From: Scott Mayhew Date: Fri, 8 Dec 2017 16:00:12 -0500 Subject: [PATCH 0403/1013] nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests commit dc4fd9ab01ab379ae5af522b3efd4187a7c30a31 upstream. If there were no commit requests, then nfs_commit_inode() should not wait on the commit or mark the inode dirty, otherwise the following BUG_ON can be triggered: [ 1917.130762] kernel BUG at fs/inode.c:578! [ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1] [ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries [ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod [ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G ------------ T 3.10.0-768.el7.ppc64 #1 [ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000 [ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80 [ 1917.130816] REGS: c00000004cea3720 TRAP: 0700 Tainted: G ------------ T (3.10.0-768.el7.ppc64) [ 1917.130820] MSR: 8000000100029032 CR: 22002822 XER: 20000000 [ 1917.130828] CFAR: c00000000011f594 SOFTE: 1 GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750 GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03 GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8 GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0 GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8 [ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350 [ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350 [ 1917.130873] Call Trace: [ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable) [ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270 [ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0 [ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs] [ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0 [ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180 [ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150 [ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100 [ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60 [ 1917.130919] Instruction dump: [ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0 [ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134 Signed-off-by: Scott Mayhew Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman --- fs/nfs/write.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/nfs/write.c b/fs/nfs/write.c index babebbccae2a..de325804941d 100644 --- a/fs/nfs/write.c +++ b/fs/nfs/write.c @@ -1889,6 +1889,8 @@ int nfs_commit_inode(struct inode *inode, int how) if (res) error = nfs_generic_commit_list(inode, &head, how, &cinfo); nfs_commit_end(cinfo.mds); + if (res == 0) + return res; if (error < 0) goto out_error; if (!may_wait) From db9d4784eb5ba8c69ea6b1a7b7c7c7af0b6bdf4d Mon Sep 17 00:00:00 2001 From: Steve Capper Date: Fri, 1 Dec 2017 17:22:14 +0000 Subject: [PATCH 0404/1013] arm64: mm: Fix pte_mkclean, pte_mkdirty semantics commit 8781bcbc5e69d7da69e84c7044ca0284848d5d01 upstream. On systems with hardware dirty bit management, the ltp madvise09 unit test fails due to dirty bit information being lost and pages being incorrectly freed. This was bisected to: arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect() Reverting this commit leads to a separate problem, that the unit test retains pages that should have been dropped due to the function madvise_free_pte_range(.) not cleaning pte's properly. Currently pte_mkclean only clears the software dirty bit, thus the following code sequence can appear: pte = pte_mkclean(pte); if (pte_dirty(pte)) // this condition can return true with HW DBM! This patch also adjusts pte_mkclean to set PTE_RDONLY thus effectively clearing both the SW and HW dirty information. In order for this to function on systems without HW DBM, we need to also adjust pte_mkdirty to remove the read only bit from writable pte's to avoid infinite fault loops. Fixes: 64c26841b349 ("arm64: Ignore hardware dirty bit updates in ptep_set_wrprotect()") Reported-by: Bhupinder Thakur Tested-by: Bhupinder Thakur Reviewed-by: Catalin Marinas Signed-off-by: Steve Capper Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/pgtable.h | 33 +++++++++++++++++--------------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index c9530b5b5ca8..960d05c8816a 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -149,12 +149,20 @@ static inline pte_t pte_mkwrite(pte_t pte) static inline pte_t pte_mkclean(pte_t pte) { - return clear_pte_bit(pte, __pgprot(PTE_DIRTY)); + pte = clear_pte_bit(pte, __pgprot(PTE_DIRTY)); + pte = set_pte_bit(pte, __pgprot(PTE_RDONLY)); + + return pte; } static inline pte_t pte_mkdirty(pte_t pte) { - return set_pte_bit(pte, __pgprot(PTE_DIRTY)); + pte = set_pte_bit(pte, __pgprot(PTE_DIRTY)); + + if (pte_write(pte)) + pte = clear_pte_bit(pte, __pgprot(PTE_RDONLY)); + + return pte; } static inline pte_t pte_mkold(pte_t pte) @@ -642,28 +650,23 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ /* - * ptep_set_wrprotect - mark read-only while preserving the hardware update of - * the Access Flag. + * ptep_set_wrprotect - mark read-only while trasferring potential hardware + * dirty status (PTE_DBM && !PTE_RDONLY) to the software PTE_DIRTY bit. */ #define __HAVE_ARCH_PTEP_SET_WRPROTECT static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long address, pte_t *ptep) { pte_t old_pte, pte; - /* - * ptep_set_wrprotect() is only called on CoW mappings which are - * private (!VM_SHARED) with the pte either read-only (!PTE_WRITE && - * PTE_RDONLY) or writable and software-dirty (PTE_WRITE && - * !PTE_RDONLY && PTE_DIRTY); see is_cow_mapping() and - * protection_map[]. There is no race with the hardware update of the - * dirty state: clearing of PTE_RDONLY when PTE_WRITE (a.k.a. PTE_DBM) - * is set. - */ - VM_WARN_ONCE(pte_write(*ptep) && !pte_dirty(*ptep), - "%s: potential race with hardware DBM", __func__); pte = READ_ONCE(*ptep); do { old_pte = pte; + /* + * If hardware-dirty (PTE_WRITE/DBM bit set and PTE_RDONLY + * clear), set the PTE_DIRTY bit. + */ + if (pte_hw_dirty(pte)) + pte = pte_mkdirty(pte); pte = pte_wrprotect(pte); pte_val(pte) = cmpxchg_relaxed(&pte_val(*ptep), pte_val(old_pte), pte_val(pte)); From cdcfd13a66eb53fd96dc480de79c423aa0de9d39 Mon Sep 17 00:00:00 2001 From: Steve Capper Date: Mon, 4 Dec 2017 14:13:05 +0000 Subject: [PATCH 0405/1013] arm64: Initialise high_memory global variable earlier commit f24e5834a2c3f6c5f814a417f858226f0a010ade upstream. The high_memory global variable is used by cma_declare_contiguous(.) before it is defined. We don't notice this as we compute __pa(high_memory - 1), and it looks like we're processing a VA from the direct linear map. This problem becomes apparent when we flip the kernel virtual address space and the linear map is moved to the bottom of the kernel VA space. This patch moves the initialisation of high_memory before it used. Fixes: f7426b983a6a ("mm: cma: adjust address limit to avoid hitting low/high memory boundary") Signed-off-by: Steve Capper Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/mm/init.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 5960bef0170d..00e7b900ca41 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -476,6 +476,8 @@ void __init arm64_memblock_init(void) reserve_elfcorehdr(); + high_memory = __va(memblock_end_of_DRAM() - 1) + 1; + dma_contiguous_reserve(arm64_dma_phys_limit); memblock_allow_resize(); @@ -502,7 +504,6 @@ void __init bootmem_init(void) sparse_init(); zone_sizes_init(min, max); - high_memory = __va((max << PAGE_SHIFT) - 1) + 1; memblock_dump_all(); } From 9854611cf5e604773fecfa75bfafc03621df508c Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 13 Dec 2017 11:45:42 +0000 Subject: [PATCH 0406/1013] arm64: fix CONFIG_DEBUG_WX address reporting commit 1d08a044cf12aee37dfd54837558e3295287b343 upstream. In ptdump_check_wx(), we pass walk_pgd() a start address of 0 (rather than VA_START) for the init_mm. This means that any reported W&X addresses are offset by VA_START, which is clearly wrong and can make them appear like userspace addresses. Fix this by telling the ptdump code that we're walking init_mm starting at VA_START. We don't need to update the addr_markers, since these are still valid bounds regardless. Fixes: 1404d6f13e47 ("arm64: dump: Add checking for writable and exectuable pages") Signed-off-by: Mark Rutland Cc: Kees Cook Cc: Laura Abbott Reported-by: Timur Tabi Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- arch/arm64/mm/dump.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c index ca74a2aace42..7b60d62ac593 100644 --- a/arch/arm64/mm/dump.c +++ b/arch/arm64/mm/dump.c @@ -389,7 +389,7 @@ void ptdump_check_wx(void) .check_wx = true, }; - walk_pgd(&st, &init_mm, 0); + walk_pgd(&st, &init_mm, VA_START); note_page(&st, 0, 0, 0); if (st.wx_pages || st.uxn_pages) pr_warn("Checked W+X mappings: FAILED, %lu W+X pages found, %lu non-UXN pages found\n", From 39c4330c9779d0acc213a0ace72b88a7b7a06012 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 5 Dec 2017 16:57:51 -0800 Subject: [PATCH 0407/1013] scsi: core: Fix a scsi_show_rq() NULL pointer dereference commit 14e3062fb18532175af4d1c4073597999f7a2248 upstream. Avoid that scsi_show_rq() triggers a NULL pointer dereference if called after sd_uninit_command(). Swap the NULL pointer assignment and the mempool_free() call in sd_uninit_command() to make it less likely that scsi_show_rq() triggers a use-after-free. Note: even with these changes scsi_show_rq() can trigger a use-after-free but that's a lesser evil than e.g. suppressing debug information for T10 PI Type 2 commands completely. This patch fixes the following oops: BUG: unable to handle kernel NULL pointer dereference at (null) IP: scsi_format_opcode_name+0x1a/0x1c0 CPU: 1 PID: 1881 Comm: cat Not tainted 4.14.0-rc2.blk_mq_io_hang+ #516 Call Trace: __scsi_format_command+0x27/0xc0 scsi_show_rq+0x5c/0xc0 __blk_mq_debugfs_rq_show+0x116/0x130 blk_mq_debugfs_rq_show+0xe/0x10 seq_read+0xfe/0x3b0 full_proxy_read+0x54/0x90 __vfs_read+0x37/0x160 vfs_read+0x96/0x130 SyS_read+0x55/0xc0 entry_SYSCALL_64_fastpath+0x1a/0xa5 [mkp: added Type 2] Fixes: 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()") Reported-by: Ming Lei Signed-off-by: Bart Van Assche Cc: James E.J. Bottomley Cc: Martin K. Petersen Cc: Ming Lei Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/scsi_debugfs.c | 6 ++++-- drivers/scsi/sd.c | 4 +++- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_debugfs.c b/drivers/scsi/scsi_debugfs.c index 01f08c03f2c1..c3765d29fd3f 100644 --- a/drivers/scsi/scsi_debugfs.c +++ b/drivers/scsi/scsi_debugfs.c @@ -8,9 +8,11 @@ void scsi_show_rq(struct seq_file *m, struct request *rq) { struct scsi_cmnd *cmd = container_of(scsi_req(rq), typeof(*cmd), req); int msecs = jiffies_to_msecs(jiffies - cmd->jiffies_at_alloc); - char buf[80]; + const u8 *const cdb = READ_ONCE(cmd->cmnd); + char buf[80] = "(?)"; - __scsi_format_command(buf, sizeof(buf), cmd->cmnd, cmd->cmd_len); + if (cdb) + __scsi_format_command(buf, sizeof(buf), cdb, cmd->cmd_len); seq_printf(m, ", .cmd=%s, .retries=%d, allocated %d.%03d s ago", buf, cmd->retries, msecs / 1000, msecs % 1000); } diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index d175c5c5ccf8..d841743b2107 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -1284,6 +1284,7 @@ static int sd_init_command(struct scsi_cmnd *cmd) static void sd_uninit_command(struct scsi_cmnd *SCpnt) { struct request *rq = SCpnt->request; + u8 *cmnd; if (SCpnt->flags & SCMD_ZONE_WRITE_LOCK) sd_zbc_write_unlock_zone(SCpnt); @@ -1292,9 +1293,10 @@ static void sd_uninit_command(struct scsi_cmnd *SCpnt) __free_page(rq->special_vec.bv_page); if (SCpnt->cmnd != scsi_req(rq)->cmd) { - mempool_free(SCpnt->cmnd, sd_cdb_pool); + cmnd = SCpnt->cmnd; SCpnt->cmnd = NULL; SCpnt->cmd_len = 0; + mempool_free(cmnd, sd_cdb_pool); } } From c2f79ce3fe441486b3ab3dfce0f7bf49bd07dc3c Mon Sep 17 00:00:00 2001 From: Jason Yan Date: Mon, 11 Dec 2017 15:03:33 +0800 Subject: [PATCH 0408/1013] scsi: libsas: fix length error in sas_smp_handler() commit 621f6401fdeefe96dfe9eab4b167c7c39f552bb0 upstream. The return value of smp_execute_task_sg() is the untransferred residual, but bsg_job_done() requires the length of payload received. This makes SMP passthrough commands from userland by sg ioctl to libsas get a wrong response. The userland tools such as smp_utils failed because of these wrong responses: ~#smp_discover /dev/bsg/expander-2\:13 response too short, len=0 ~#smp_discover /dev/bsg/expander-2\:134 response too short, len=0 Fix this by passing the actual received length to bsg_job_done(). And if smp_execute_task_sg() returns 0, this means received length is exactly the buffer length. [mkp: typo] Fixes: 651a01364994 ("scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough") Signed-off-by: Jason Yan Reported-by: chenqilin Tested-by: chenqilin CC: Christoph Hellwig Signed-off-by: Martin K. Petersen Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/libsas/sas_expander.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c index 6b4fd2375178..324d8d8c62de 100644 --- a/drivers/scsi/libsas/sas_expander.c +++ b/drivers/scsi/libsas/sas_expander.c @@ -2145,7 +2145,7 @@ void sas_smp_handler(struct bsg_job *job, struct Scsi_Host *shost, struct sas_rphy *rphy) { struct domain_device *dev; - unsigned int reslen = 0; + unsigned int rcvlen = 0; int ret = -EINVAL; /* no rphy means no smp target support (ie aic94xx host) */ @@ -2179,12 +2179,12 @@ void sas_smp_handler(struct bsg_job *job, struct Scsi_Host *shost, ret = smp_execute_task_sg(dev, job->request_payload.sg_list, job->reply_payload.sg_list); - if (ret > 0) { - /* positive number is the untransferred residual */ - reslen = ret; + if (ret >= 0) { + /* bsg_job_done() requires the length received */ + rcvlen = job->reply_payload.payload_len - ret; ret = 0; } out: - bsg_job_done(job, ret, reslen); + bsg_job_done(job, ret, rcvlen); } From 282e4b259d4f9429f7813efc8076dd601ed59f45 Mon Sep 17 00:00:00 2001 From: Steven Rostedt Date: Sat, 2 Dec 2017 13:04:54 -0500 Subject: [PATCH 0409/1013] sched/rt: Do not pull from current CPU if only one CPU to pull commit f73c52a5bcd1710994e53fbccc378c42b97a06b6 upstream. Daniel Wagner reported a crash on the BeagleBone Black SoC. This is a single CPU architecture, and does not have a functional arch_send_call_function_single_ipi() implementation which can crash the kernel if that is called. As it only has one CPU, it shouldn't be called, but if the kernel is compiled for SMP, the push/pull RT scheduling logic now calls it for irq_work if the one CPU is overloaded, it can use that function to call itself and crash the kernel. Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system only has a single CPU. But SCHED_FEAT is a constant if sched debugging is turned off. Another fix can also be used, and this should also help with normal SMP machines. That is, do not initiate the pull code if there's only one RT overloaded CPU, and that CPU happens to be the current CPU that is scheduling in a lower priority task. Even on a system with many CPUs, if there's many RT tasks waiting to run on a single CPU, and that CPU schedules in another RT task of lower priority, it will initiate the PULL logic in case there's a higher priority RT task on another CPU that is waiting to run. But if there is no other CPU with waiting RT tasks, it will initiate the RT pull logic on itself (as it still has RT tasks waiting to run). This is a wasted effort. Not only does this help with SMP code where the current CPU is the only one with RT overloaded tasks, it should also solve the issue that Daniel encountered, because it will prevent the PULL logic from executing, as there's only one CPU on the system, and the check added here will cause it to exit the RT pull code. Reported-by: Daniel Wagner Signed-off-by: Steven Rostedt (VMware) Acked-by: Peter Zijlstra Cc: Linus Torvalds Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Cc: linux-rt-users Fixes: 4bdced5c9 ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.home Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- kernel/sched/rt.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index d8c43d73e078..7464c5c4de46 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -2034,8 +2034,9 @@ static void pull_rt_task(struct rq *this_rq) bool resched = false; struct task_struct *p; struct rq *src_rq; + int rt_overload_count = rt_overloaded(this_rq); - if (likely(!rt_overloaded(this_rq))) + if (likely(!rt_overload_count)) return; /* @@ -2044,6 +2045,11 @@ static void pull_rt_task(struct rq *this_rq) */ smp_rmb(); + /* If we are the only overloaded CPU do nothing */ + if (rt_overload_count == 1 && + cpumask_test_cpu(this_rq->cpu, this_rq->rd->rto_mask)) + return; + #ifdef HAVE_RT_PUSH_IPI if (sched_feat(RT_PUSH_IPI)) { tell_cpu_to_push(this_rq); From fbce429b410cc443ec3fa3293a0e4e5804ae4f2d Mon Sep 17 00:00:00 2001 From: "monty_pavel@sina.com" Date: Sat, 25 Nov 2017 01:43:50 +0800 Subject: [PATCH 0410/1013] dm: fix various targets to dm_register_target after module __init resources created commit 7e6358d244e4706fe612a77b9c36519a33600ac0 upstream. A NULL pointer is seen if two concurrent "vgchange -ay -K " processes race to load the dm-thin-pool module: PID: 25992 TASK: ffff883cd7d23500 CPU: 4 COMMAND: "vgchange" #0 [ffff883cd743d600] machine_kexec at ffffffff81038fa9 0000001 [ffff883cd743d660] crash_kexec at ffffffff810c5992 0000002 [ffff883cd743d730] oops_end at ffffffff81515c90 0000003 [ffff883cd743d760] no_context at ffffffff81049f1b 0000004 [ffff883cd743d7b0] __bad_area_nosemaphore at ffffffff8104a1a5 0000005 [ffff883cd743d800] bad_area at ffffffff8104a2ce 0000006 [ffff883cd743d830] __do_page_fault at ffffffff8104aa6f 0000007 [ffff883cd743d950] do_page_fault at ffffffff81517bae 0000008 [ffff883cd743d980] page_fault at ffffffff81514f95 [exception RIP: kmem_cache_alloc+108] RIP: ffffffff8116ef3c RSP: ffff883cd743da38 RFLAGS: 00010046 RAX: 0000000000000004 RBX: ffffffff81121b90 RCX: ffff881bf1e78cc0 RDX: 0000000000000000 RSI: 00000000000000d0 RDI: 0000000000000000 RBP: ffff883cd743da68 R8: ffff881bf1a4eb00 R9: 0000000080042000 R10: 0000000000002000 R11: 0000000000000000 R12: 00000000000000d0 R13: 0000000000000000 R14: 00000000000000d0 R15: 0000000000000246 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 0000009 [ffff883cd743da70] mempool_alloc_slab at ffffffff81121ba5 0000010 [ffff883cd743da80] mempool_create_node at ffffffff81122083 0000011 [ffff883cd743dad0] mempool_create at ffffffff811220f4 0000012 [ffff883cd743dae0] pool_ctr at ffffffffa08de049 [dm_thin_pool] 0000013 [ffff883cd743dbd0] dm_table_add_target at ffffffffa0005f2f [dm_mod] 0000014 [ffff883cd743dc30] table_load at ffffffffa0008ba9 [dm_mod] 0000015 [ffff883cd743dc90] ctl_ioctl at ffffffffa0009dc4 [dm_mod] The race results in a NULL pointer because: Process A (vgchange -ay -K): a. send DM_LIST_VERSIONS_CMD ioctl; b. pool_target not registered; c. modprobe dm_thin_pool and wait until end. Process B (vgchange -ay -K): a. send DM_LIST_VERSIONS_CMD ioctl; b. pool_target registered; c. table_load->dm_table_add_target->pool_ctr; d. _new_mapping_cache is NULL and panic. Note: 1. process A and process B are two concurrent processes. 2. pool_target can be detected by process B but _new_mapping_cache initialization has not ended. To fix dm-thin-pool, and other targets (cache, multipath, and snapshot) with the same problem, simply dm_register_target() after all resources created during module init (as labelled with __init) are finished. Signed-off-by: monty Signed-off-by: Mike Snitzer Signed-off-by: Greg Kroah-Hartman --- drivers/md/dm-cache-target.c | 12 ++++----- drivers/md/dm-mpath.c | 18 +++++++------- drivers/md/dm-snap.c | 48 ++++++++++++++++++------------------ drivers/md/dm-thin.c | 22 ++++++++--------- 4 files changed, 49 insertions(+), 51 deletions(-) diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c index 0b7edfd0b454..71c3507df9a0 100644 --- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -3554,18 +3554,18 @@ static int __init dm_cache_init(void) { int r; - r = dm_register_target(&cache_target); - if (r) { - DMERR("cache target registration failed: %d", r); - return r; - } - migration_cache = KMEM_CACHE(dm_cache_migration, 0); if (!migration_cache) { dm_unregister_target(&cache_target); return -ENOMEM; } + r = dm_register_target(&cache_target); + if (r) { + DMERR("cache target registration failed: %d", r); + return r; + } + return 0; } diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index e8094d8fbe0d..35e82b14ded7 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -1965,13 +1965,6 @@ static int __init dm_multipath_init(void) { int r; - r = dm_register_target(&multipath_target); - if (r < 0) { - DMERR("request-based register failed %d", r); - r = -EINVAL; - goto bad_register_target; - } - kmultipathd = alloc_workqueue("kmpathd", WQ_MEM_RECLAIM, 0); if (!kmultipathd) { DMERR("failed to create workqueue kmpathd"); @@ -1993,13 +1986,20 @@ static int __init dm_multipath_init(void) goto bad_alloc_kmpath_handlerd; } + r = dm_register_target(&multipath_target); + if (r < 0) { + DMERR("request-based register failed %d", r); + r = -EINVAL; + goto bad_register_target; + } + return 0; +bad_register_target: + destroy_workqueue(kmpath_handlerd); bad_alloc_kmpath_handlerd: destroy_workqueue(kmultipathd); bad_alloc_kmultipathd: - dm_unregister_target(&multipath_target); -bad_register_target: return r; } diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index 1113b42e1eda..a0613bd8ed00 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -2411,24 +2411,6 @@ static int __init dm_snapshot_init(void) return r; } - r = dm_register_target(&snapshot_target); - if (r < 0) { - DMERR("snapshot target register failed %d", r); - goto bad_register_snapshot_target; - } - - r = dm_register_target(&origin_target); - if (r < 0) { - DMERR("Origin target register failed %d", r); - goto bad_register_origin_target; - } - - r = dm_register_target(&merge_target); - if (r < 0) { - DMERR("Merge target register failed %d", r); - goto bad_register_merge_target; - } - r = init_origin_hash(); if (r) { DMERR("init_origin_hash failed."); @@ -2449,19 +2431,37 @@ static int __init dm_snapshot_init(void) goto bad_pending_cache; } + r = dm_register_target(&snapshot_target); + if (r < 0) { + DMERR("snapshot target register failed %d", r); + goto bad_register_snapshot_target; + } + + r = dm_register_target(&origin_target); + if (r < 0) { + DMERR("Origin target register failed %d", r); + goto bad_register_origin_target; + } + + r = dm_register_target(&merge_target); + if (r < 0) { + DMERR("Merge target register failed %d", r); + goto bad_register_merge_target; + } + return 0; -bad_pending_cache: - kmem_cache_destroy(exception_cache); -bad_exception_cache: - exit_origin_hash(); -bad_origin_hash: - dm_unregister_target(&merge_target); bad_register_merge_target: dm_unregister_target(&origin_target); bad_register_origin_target: dm_unregister_target(&snapshot_target); bad_register_snapshot_target: + kmem_cache_destroy(pending_cache); +bad_pending_cache: + kmem_cache_destroy(exception_cache); +bad_exception_cache: + exit_origin_hash(); +bad_origin_hash: dm_exception_store_exit(); return r; diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c index 1e25705209c2..02e42ba2ecbc 100644 --- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -4355,30 +4355,28 @@ static struct target_type thin_target = { static int __init dm_thin_init(void) { - int r; + int r = -ENOMEM; pool_table_init(); + _new_mapping_cache = KMEM_CACHE(dm_thin_new_mapping, 0); + if (!_new_mapping_cache) + return r; + r = dm_register_target(&thin_target); if (r) - return r; + goto bad_new_mapping_cache; r = dm_register_target(&pool_target); if (r) - goto bad_pool_target; - - r = -ENOMEM; - - _new_mapping_cache = KMEM_CACHE(dm_thin_new_mapping, 0); - if (!_new_mapping_cache) - goto bad_new_mapping_cache; + goto bad_thin_target; return 0; -bad_new_mapping_cache: - dm_unregister_target(&pool_target); -bad_pool_target: +bad_thin_target: dm_unregister_target(&thin_target); +bad_new_mapping_cache: + kmem_cache_destroy(_new_mapping_cache); return r; } From 7dd1362247ebfe06ab1f58d7fee739c5f773a0ac Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 14 Dec 2017 21:24:08 -0500 Subject: [PATCH 0411/1013] SUNRPC: Fix a race in the receive code path commit 90d91b0cd371193d9dbfa9beacab8ab9a4cb75e0 upstream. We must ensure that the call to rpc_sleep_on() in xprt_transmit() cannot race with the call to xprt_complete_rqst(). Reported-by: Chuck Lever Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=317 Fixes: ce7c252a8c74 ("SUNRPC: Add a separate spinlock to protect..") Reviewed-by: Chuck Lever Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker Signed-off-by: Greg Kroah-Hartman --- net/sunrpc/xprt.c | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index 898485e3ece4..8eb0c4f3b3e9 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1001,6 +1001,7 @@ void xprt_transmit(struct rpc_task *task) { struct rpc_rqst *req = task->tk_rqstp; struct rpc_xprt *xprt = req->rq_xprt; + unsigned int connect_cookie; int status, numreqs; dprintk("RPC: %5u xprt_transmit(%u)\n", task->tk_pid, req->rq_slen); @@ -1024,6 +1025,7 @@ void xprt_transmit(struct rpc_task *task) } else if (!req->rq_bytes_sent) return; + connect_cookie = xprt->connect_cookie; req->rq_xtime = ktime_get(); status = xprt->ops->send_request(task); trace_xprt_transmit(xprt, req->rq_xid, status); @@ -1047,20 +1049,28 @@ void xprt_transmit(struct rpc_task *task) xprt->stat.bklog_u += xprt->backlog.qlen; xprt->stat.sending_u += xprt->sending.qlen; xprt->stat.pending_u += xprt->pending.qlen; + spin_unlock_bh(&xprt->transport_lock); - /* Don't race with disconnect */ - if (!xprt_connected(xprt)) - task->tk_status = -ENOTCONN; - else { + req->rq_connect_cookie = connect_cookie; + if (rpc_reply_expected(task) && !READ_ONCE(req->rq_reply_bytes_recvd)) { /* - * Sleep on the pending queue since - * we're expecting a reply. + * Sleep on the pending queue if we're expecting a reply. + * The spinlock ensures atomicity between the test of + * req->rq_reply_bytes_recvd, and the call to rpc_sleep_on(). */ - if (!req->rq_reply_bytes_recvd && rpc_reply_expected(task)) + spin_lock(&xprt->recv_lock); + if (!req->rq_reply_bytes_recvd) { rpc_sleep_on(&xprt->pending, task, xprt_timer); - req->rq_connect_cookie = xprt->connect_cookie; + /* + * Send an extra queue wakeup call if the + * connection was dropped in case the call to + * rpc_sleep_on() raced. + */ + if (!xprt_connected(xprt)) + xprt_wake_pending_tasks(xprt, -ENOTCONN); + } + spin_unlock(&xprt->recv_lock); } - spin_unlock_bh(&xprt->transport_lock); } static void xprt_add_backlog(struct rpc_xprt *xprt, struct rpc_task *task) From 2d8262155ab3d20219ca5815f65573592d717c3a Mon Sep 17 00:00:00 2001 From: Steve Wise Date: Mon, 27 Nov 2017 13:16:32 -0800 Subject: [PATCH 0412/1013] iw_cxgb4: only insert drain cqes if wq is flushed commit c058ecf6e455fac7346d46197a02398ead90851f upstream. Only insert our special drain CQEs to support ib_drain_sq/rq() after the wq is flushed. Otherwise, existing but not yet polled CQEs can be returned out of order to the user application. This can happen when the QP has exited RTS but not yet flushed the QP, which can happen during a normal close (vs abortive close). In addition never count the drain CQEs when determining how many CQEs need to be synthesized during the flush operation. This latter issue should never happen if the QP is properly flushed before inserting the drain CQE, but I wanted to avoid corrupting the CQ state. So we handle it and log a warning once. Fixes: 4fe7c2962e11 ("iw_cxgb4: refactor sq/rq drain logic") Signed-off-by: Steve Wise Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/cxgb4/cq.c | 5 +++++ drivers/infiniband/hw/cxgb4/qp.c | 14 ++++++++++++-- 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c index be07da1997e6..eae8ea81c6e2 100644 --- a/drivers/infiniband/hw/cxgb4/cq.c +++ b/drivers/infiniband/hw/cxgb4/cq.c @@ -410,6 +410,11 @@ void c4iw_flush_hw_cq(struct c4iw_cq *chp) static int cqe_completes_wr(struct t4_cqe *cqe, struct t4_wq *wq) { + if (CQE_OPCODE(cqe) == C4IW_DRAIN_OPCODE) { + WARN_ONCE(1, "Unexpected DRAIN CQE qp id %u!\n", wq->sq.qid); + return 0; + } + if (CQE_OPCODE(cqe) == FW_RI_TERMINATE) return 0; diff --git a/drivers/infiniband/hw/cxgb4/qp.c b/drivers/infiniband/hw/cxgb4/qp.c index cb7fc0d35d1d..e69453665a17 100644 --- a/drivers/infiniband/hw/cxgb4/qp.c +++ b/drivers/infiniband/hw/cxgb4/qp.c @@ -868,7 +868,12 @@ int c4iw_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr, qhp = to_c4iw_qp(ibqp); spin_lock_irqsave(&qhp->lock, flag); - if (t4_wq_in_error(&qhp->wq)) { + + /* + * If the qp has been flushed, then just insert a special + * drain cqe. + */ + if (qhp->wq.flushed) { spin_unlock_irqrestore(&qhp->lock, flag); complete_sq_drain_wr(qhp, wr); return err; @@ -1012,7 +1017,12 @@ int c4iw_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr, qhp = to_c4iw_qp(ibqp); spin_lock_irqsave(&qhp->lock, flag); - if (t4_wq_in_error(&qhp->wq)) { + + /* + * If the qp has been flushed, then just insert a special + * drain cqe. + */ + if (qhp->wq.flushed) { spin_unlock_irqrestore(&qhp->lock, flag); complete_rq_drain_wr(qhp, wr); return err; From 1bd2b3061273f1bd7d691e3ea5c15e1f57f8ab6d Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Mon, 4 Dec 2017 15:40:55 +0300 Subject: [PATCH 0413/1013] x86/boot/compressed/64: Detect and handle 5-level paging at boot-time commit 08529078d8d9adf689bf39cc38d53979a0869970 upstream. Prerequisite for fixing the current problem of instantaneous reboots when a 5-level paging kernel is booted on 4-level paging hardware. At the same time this change prepares the decompression code to boot-time switching between 4- and 5-level paging. [ tglx: Folded the GCC < 5 fix. ] Fixes: 77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y") Signed-off-by: Kirill A. Shutemov Signed-off-by: Thomas Gleixner Cc: Andi Kleen Cc: Andy Lutomirski Cc: linux-mm@kvack.org Cc: Cyrill Gorcunov Cc: Borislav Petkov Cc: Linus Torvalds Link: https://lkml.kernel.org/r/20171204124059.63515-2-kirill.shutemov@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/boot/compressed/Makefile | 1 + arch/x86/boot/compressed/head_64.S | 16 +++++++++++---- arch/x86/boot/compressed/pgtable_64.c | 28 +++++++++++++++++++++++++++ 3 files changed, 41 insertions(+), 4 deletions(-) create mode 100644 arch/x86/boot/compressed/pgtable_64.c diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index 4b7575b00563..98018a621f6b 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -78,6 +78,7 @@ vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o ifdef CONFIG_X86_64 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/pagetable.o + vmlinux-objs-y += $(obj)/pgtable_64.o endif $(obj)/eboot.o: KBUILD_CFLAGS += -fshort-wchar -mno-red-zone diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S index beb255b66447..4b3d92a37c80 100644 --- a/arch/x86/boot/compressed/head_64.S +++ b/arch/x86/boot/compressed/head_64.S @@ -289,10 +289,18 @@ ENTRY(startup_64) leaq boot_stack_end(%rbx), %rsp #ifdef CONFIG_X86_5LEVEL - /* Check if 5-level paging has already enabled */ - movq %cr4, %rax - testl $X86_CR4_LA57, %eax - jnz lvl5 + /* + * Check if we need to enable 5-level paging. + * RSI holds real mode data and need to be preserved across + * a function call. + */ + pushq %rsi + call l5_paging_required + popq %rsi + + /* If l5_paging_required() returned zero, we're done here. */ + cmpq $0, %rax + je lvl5 /* * At this point we are in long mode with 4-level paging enabled, diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c new file mode 100644 index 000000000000..b4469a37e9a1 --- /dev/null +++ b/arch/x86/boot/compressed/pgtable_64.c @@ -0,0 +1,28 @@ +#include + +/* + * __force_order is used by special_insns.h asm code to force instruction + * serialization. + * + * It is not referenced from the code, but GCC < 5 with -fPIE would fail + * due to an undefined symbol. Define it to make these ancient GCCs work. + */ +unsigned long __force_order; + +int l5_paging_required(void) +{ + /* Check if leaf 7 is supported. */ + + if (native_cpuid_eax(0) < 7) + return 0; + + /* Check if la57 is supported. */ + if (!(native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)))) + return 0; + + /* Check if 5-level paging has already been enabled. */ + if (native_read_cr4() & X86_CR4_LA57) + return 0; + + return 1; +} From e48a17d115f84048294ae366ff6e9dd7c8c6cca4 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Mon, 4 Dec 2017 15:40:56 +0300 Subject: [PATCH 0414/1013] x86/boot/compressed/64: Print error if 5-level paging is not supported commit 6d7e0ba2d2be9e50cccba213baf07e0e183c1b24 upstream. If the machine does not support the paging mode for which the kernel was compiled, the boot process cannot continue. It's not possible to let the kernel detect the mismatch as it does not even reach the point where cpu features can be evaluted due to a triple fault in the KASLR setup. Instead of instantaneous silent reboot, emit an error message which gives the user the information why the boot fails. Fixes: 77ef56e4f0fb ("x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y") Reported-by: Borislav Petkov Signed-off-by: Kirill A. Shutemov Signed-off-by: Thomas Gleixner Tested-by: Borislav Petkov Cc: Andi Kleen Cc: Andy Lutomirski Cc: linux-mm@kvack.org Cc: Cyrill Gorcunov Cc: Linus Torvalds Link: https://lkml.kernel.org/r/20171204124059.63515-3-kirill.shutemov@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/boot/compressed/misc.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index b50c42455e25..98761a1576ce 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -169,6 +169,16 @@ void __puthex(unsigned long value) } } +static bool l5_supported(void) +{ + /* Check if leaf 7 is supported. */ + if (native_cpuid_eax(0) < 7) + return 0; + + /* Check if la57 is supported. */ + return native_cpuid_ecx(7) & (1 << (X86_FEATURE_LA57 & 31)); +} + #if CONFIG_X86_NEED_RELOCS static void handle_relocations(void *output, unsigned long output_len, unsigned long virt_addr) @@ -362,6 +372,12 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, console_init(); debug_putstr("early console in extract_kernel\n"); + if (IS_ENABLED(CONFIG_X86_5LEVEL) && !l5_supported()) { + error("This linux kernel as configured requires 5-level paging\n" + "This CPU does not support the required 'cr4.la57' feature\n" + "Unable to boot - please use a kernel appropriate for your CPU\n"); + } + free_mem_ptr = heap; /* Heap */ free_mem_end_ptr = heap + BOOT_HEAP_SIZE; From 6e46e964e2978f64f30b89be2423093b43e77466 Mon Sep 17 00:00:00 2001 From: David Lechner Date: Sun, 3 Dec 2017 19:54:41 -0600 Subject: [PATCH 0415/1013] eeprom: at24: change nvmem stride to 1 commit 7f6d2ecd3d7acaf205ea7b3e96f9ffc55b92298b upstream. Trying to read the MAC address from an eeprom that has an offset that is not a multiple of 4 causes an error currently. Fix it by changing the nvmem stride to 1. Signed-off-by: David Lechner [Bartosz: tweaked the commit message] Signed-off-by: Bartosz Golaszewski Signed-off-by: Greg Kroah-Hartman --- drivers/misc/eeprom/at24.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/misc/eeprom/at24.c b/drivers/misc/eeprom/at24.c index 372b2060fbba..4cc0b42f2acc 100644 --- a/drivers/misc/eeprom/at24.c +++ b/drivers/misc/eeprom/at24.c @@ -776,7 +776,7 @@ static int at24_probe(struct i2c_client *client, const struct i2c_device_id *id) at24->nvmem_config.reg_read = at24_read; at24->nvmem_config.reg_write = at24_write; at24->nvmem_config.priv = at24; - at24->nvmem_config.stride = 4; + at24->nvmem_config.stride = 1; at24->nvmem_config.word_size = 1; at24->nvmem_config.size = chip.byte_len; From 3df23f7ce7255d1ef2a616071cac359a245fb6de Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 15 Dec 2017 10:32:03 +0100 Subject: [PATCH 0416/1013] posix-timer: Properly check sigevent->sigev_notify commit cef31d9af908243421258f1df35a4a644604efbe upstream. timer_create() specifies via sigevent->sigev_notify the signal delivery for the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD and (SIGEV_SIGNAL | SIGEV_THREAD_ID). The sanity check in good_sigevent() is only checking the valid combination for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is not set it accepts any random value. This has no real effects on the posix timer and signal delivery code, but it affects show_timer() which handles the output of /proc/$PID/timers. That function uses a string array to pretty print sigev_notify. The access to that array has no bound checks, so random sigev_notify cause access beyond the array bounds. Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID masking from various code pathes as SIGEV_NONE can never be set in combination with SIGEV_THREAD_ID. Reported-by: Eric Biggers Reported-by: Dmitry Vyukov Reported-by: Alexey Dobriyan Signed-off-by: Thomas Gleixner Cc: John Stultz Signed-off-by: Greg Kroah-Hartman --- kernel/time/posix-timers.c | 29 +++++++++++++++++------------ 1 file changed, 17 insertions(+), 12 deletions(-) diff --git a/kernel/time/posix-timers.c b/kernel/time/posix-timers.c index 13d6881f908b..ec999f32c840 100644 --- a/kernel/time/posix-timers.c +++ b/kernel/time/posix-timers.c @@ -434,17 +434,22 @@ static struct pid *good_sigevent(sigevent_t * event) { struct task_struct *rtn = current->group_leader; - if ((event->sigev_notify & SIGEV_THREAD_ID ) && - (!(rtn = find_task_by_vpid(event->sigev_notify_thread_id)) || - !same_thread_group(rtn, current) || - (event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_SIGNAL)) + switch (event->sigev_notify) { + case SIGEV_SIGNAL | SIGEV_THREAD_ID: + rtn = find_task_by_vpid(event->sigev_notify_thread_id); + if (!rtn || !same_thread_group(rtn, current)) + return NULL; + /* FALLTHRU */ + case SIGEV_SIGNAL: + case SIGEV_THREAD: + if (event->sigev_signo <= 0 || event->sigev_signo > SIGRTMAX) + return NULL; + /* FALLTHRU */ + case SIGEV_NONE: + return task_pid(rtn); + default: return NULL; - - if (((event->sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) && - ((event->sigev_signo <= 0) || (event->sigev_signo > SIGRTMAX))) - return NULL; - - return task_pid(rtn); + } } static struct k_itimer * alloc_posix_timer(void) @@ -669,7 +674,7 @@ void common_timer_get(struct k_itimer *timr, struct itimerspec64 *cur_setting) struct timespec64 ts64; bool sig_none; - sig_none = (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE; + sig_none = timr->it_sigev_notify == SIGEV_NONE; iv = timr->it_interval; /* interval timer ? */ @@ -856,7 +861,7 @@ int common_timer_set(struct k_itimer *timr, int flags, timr->it_interval = timespec64_to_ktime(new_setting->it_interval); expires = timespec64_to_ktime(new_setting->it_value); - sigev_none = (timr->it_sigev_notify & ~SIGEV_THREAD_ID) == SIGEV_NONE; + sigev_none = timr->it_sigev_notify == SIGEV_NONE; kc->timer_arm(timr, expires, flags & TIMER_ABSTIME, sigev_none); timr->it_active = !sigev_none; From 0aa5d007ba673bc35378422b0c2b2b797da69424 Mon Sep 17 00:00:00 2001 From: Adam Wallis Date: Mon, 27 Nov 2017 10:45:01 -0500 Subject: [PATCH 0417/1013] dmaengine: dmatest: move callback wait queue to thread context commit 6f6a23a213be51728502b88741ba6a10cda2441d upstream. Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") introduced a bug (that is in fact documented by the patch commit text) that leaves behind a dangling pointer. Since the done_wait structure is allocated on the stack, future invocations to the DMATEST can produce undesirable results (e.g., corrupted spinlocks). Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times out") attempted to WARN the user that the stack was likely corrupted but did not fix the actual issue. This patch fixes the issue by pushing the wait queue and callback structs into the the thread structure. If a failure occurs due to time, dmaengine_terminate_all will force the callback to safely call wake_up_all() without possibility of using a freed pointer. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605 Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") Reviewed-by: Sinan Kaya Suggested-by: Shunyong Yang Signed-off-by: Adam Wallis Signed-off-by: Vinod Koul Signed-off-by: Greg Kroah-Hartman --- drivers/dma/dmatest.c | 55 ++++++++++++++++++++++++------------------- 1 file changed, 31 insertions(+), 24 deletions(-) diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c index 47edc7fbf91f..ec5f9d2bc820 100644 --- a/drivers/dma/dmatest.c +++ b/drivers/dma/dmatest.c @@ -155,6 +155,12 @@ MODULE_PARM_DESC(run, "Run the test (default: false)"); #define PATTERN_COUNT_MASK 0x1f #define PATTERN_MEMSET_IDX 0x01 +/* poor man's completion - we want to use wait_event_freezable() on it */ +struct dmatest_done { + bool done; + wait_queue_head_t *wait; +}; + struct dmatest_thread { struct list_head node; struct dmatest_info *info; @@ -165,6 +171,8 @@ struct dmatest_thread { u8 **dsts; u8 **udsts; enum dma_transaction_type type; + wait_queue_head_t done_wait; + struct dmatest_done test_done; bool done; }; @@ -342,18 +350,25 @@ static unsigned int dmatest_verify(u8 **bufs, unsigned int start, return error_count; } -/* poor man's completion - we want to use wait_event_freezable() on it */ -struct dmatest_done { - bool done; - wait_queue_head_t *wait; -}; static void dmatest_callback(void *arg) { struct dmatest_done *done = arg; - - done->done = true; - wake_up_all(done->wait); + struct dmatest_thread *thread = + container_of(arg, struct dmatest_thread, done_wait); + if (!thread->done) { + done->done = true; + wake_up_all(done->wait); + } else { + /* + * If thread->done, it means that this callback occurred + * after the parent thread has cleaned up. This can + * happen in the case that driver doesn't implement + * the terminate_all() functionality and a dma operation + * did not occur within the timeout period + */ + WARN(1, "dmatest: Kernel memory may be corrupted!!\n"); + } } static unsigned int min_odd(unsigned int x, unsigned int y) @@ -424,9 +439,8 @@ static unsigned long long dmatest_KBs(s64 runtime, unsigned long long len) */ static int dmatest_func(void *data) { - DECLARE_WAIT_QUEUE_HEAD_ONSTACK(done_wait); struct dmatest_thread *thread = data; - struct dmatest_done done = { .wait = &done_wait }; + struct dmatest_done *done = &thread->test_done; struct dmatest_info *info; struct dmatest_params *params; struct dma_chan *chan; @@ -673,9 +687,9 @@ static int dmatest_func(void *data) continue; } - done.done = false; + done->done = false; tx->callback = dmatest_callback; - tx->callback_param = &done; + tx->callback_param = done; cookie = tx->tx_submit(tx); if (dma_submit_error(cookie)) { @@ -688,21 +702,12 @@ static int dmatest_func(void *data) } dma_async_issue_pending(chan); - wait_event_freezable_timeout(done_wait, done.done, + wait_event_freezable_timeout(thread->done_wait, done->done, msecs_to_jiffies(params->timeout)); status = dma_async_is_tx_complete(chan, cookie, NULL, NULL); - if (!done.done) { - /* - * We're leaving the timed out dma operation with - * dangling pointer to done_wait. To make this - * correct, we'll need to allocate wait_done for - * each test iteration and perform "who's gonna - * free it this time?" dancing. For now, just - * leave it dangling. - */ - WARN(1, "dmatest: Kernel stack may be corrupted!!\n"); + if (!done->done) { dmaengine_unmap_put(um); result("test timed out", total_tests, src_off, dst_off, len, 0); @@ -789,7 +794,7 @@ static int dmatest_func(void *data) dmatest_KBs(runtime, total_len), ret); /* terminate all transfers on specified channels */ - if (ret) + if (ret || failed_tests) dmaengine_terminate_all(chan); thread->done = true; @@ -849,6 +854,8 @@ static int dmatest_add_threads(struct dmatest_info *info, thread->info = info; thread->chan = dtc->chan; thread->type = type; + thread->test_done.wait = &thread->done_wait; + init_waitqueue_head(&thread->done_wait); smp_wmb(); thread->task = kthread_create(dmatest_func, thread, "%s-%s%u", dma_chan_name(chan), op, i); From 2dea756b487547788c1f28439fbc3df270e6bf30 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 12 Dec 2017 11:28:38 -0800 Subject: [PATCH 0418/1013] Revert "exec: avoid RLIMIT_STACK races with prlimit()" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 779f4e1c6c7c661db40dfebd6dd6bda7b5f88aa3 upstream. This reverts commit 04e35f4495dd560db30c25efca4eecae8ec8c375. SELinux runs with secureexec for all non-"noatsecure" domain transitions, which means lots of processes end up hitting the stack hard-limit change that was introduced in order to fix a race with prlimit(). That race fix will need to be redesigned. Reported-by: Laura Abbott Reported-by: Tomáš Trnka Signed-off-by: Kees Cook Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/exec.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 4726c777dd38..3e14ba25f678 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1340,15 +1340,10 @@ void setup_new_exec(struct linux_binprm * bprm) * avoid bad behavior from the prior rlimits. This has to * happen before arch_pick_mmap_layout(), which examines * RLIMIT_STACK, but after the point of no return to avoid - * races from other threads changing the limits. This also - * must be protected from races with prlimit() calls. + * needing to clean up the change on failure. */ - task_lock(current->group_leader); if (current->signal->rlim[RLIMIT_STACK].rlim_cur > _STK_LIM) current->signal->rlim[RLIMIT_STACK].rlim_cur = _STK_LIM; - if (current->signal->rlim[RLIMIT_STACK].rlim_max > _STK_LIM) - current->signal->rlim[RLIMIT_STACK].rlim_max = _STK_LIM; - task_unlock(current->group_leader); } arch_pick_mmap_layout(current->mm); From ae0bba6b38110e8b57b92aa820df8f78d3965fe6 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Sun, 3 Dec 2017 20:38:01 -0500 Subject: [PATCH 0419/1013] ext4: support fast symlinks from ext3 file systems commit fc82228a5e3860502dbf3bfa4a9570cb7093cf7f upstream. 407cd7fb83c0 (ext4: change fast symlink test to not rely on i_blocks) broke ~10 years old ext3 file systems created by 2.6.17. Any ELF executable fails because the /lib/ld-linux.so.2 fast symlink cannot be read anymore. The patch assumed fast symlinks were created in a specific way, but that's not true on these really old file systems. The new behavior is apparently needed only with the large EA inode feature. Revert to the old behavior if the large EA inode feature is not set. This makes my old VM boot again. Fixes: 407cd7fb83c0 (ext4: change fast symlink test to not rely on i_blocks) Signed-off-by: Andi Kleen Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger Signed-off-by: Greg Kroah-Hartman --- fs/ext4/inode.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 38eb621edd80..ea2ccc524bd9 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -149,6 +149,15 @@ static int ext4_meta_trans_blocks(struct inode *inode, int lblocks, */ int ext4_inode_is_fast_symlink(struct inode *inode) { + if (!(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) { + int ea_blocks = EXT4_I(inode)->i_file_acl ? + EXT4_CLUSTER_SIZE(inode->i_sb) >> 9 : 0; + + if (ext4_has_inline_data(inode)) + return 0; + + return (S_ISLNK(inode->i_mode) && inode->i_blocks - ea_blocks == 0); + } return S_ISLNK(inode->i_mode) && inode->i_size && (inode->i_size < EXT4_N_BLOCKS * 4); } From 4ff7da066d64e5083ef886203ba3a0623343c632 Mon Sep 17 00:00:00 2001 From: Eryu Guan Date: Sun, 3 Dec 2017 22:52:51 -0500 Subject: [PATCH 0420/1013] ext4: fix fdatasync(2) after fallocate(2) operation commit c894aa97577e47d3066b27b32499ecf899bfa8b0 upstream. Currently, fallocate(2) with KEEP_SIZE followed by a fdatasync(2) then crash, we'll see wrong allocated block number (stat -c %b), the blocks allocated beyond EOF are all lost. fstests generic/468 exposes this bug. Commit 67a7d5f561f4 ("ext4: fix fdatasync(2) after extent manipulation operations") fixed all the other extent manipulation operation paths such as hole punch, zero range, collapse range etc., but forgot the fallocate case. So similarly, fix it by recording the correct journal tid in ext4 inode in fallocate(2) path, so that ext4_sync_file() will wait for the right tid to be committed on fdatasync(2). This addresses the test failure in xfstests test generic/468. Signed-off-by: Eryu Guan Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/ext4/extents.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index 07bca11749d4..c941251ac0c0 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -4722,6 +4722,7 @@ static int ext4_alloc_file_blocks(struct file *file, ext4_lblk_t offset, EXT4_INODE_EOFBLOCKS); } ext4_mark_inode_dirty(handle, inode); + ext4_update_inode_fsync_trans(handle, inode, 1); ret2 = ext4_journal_stop(handle); if (ret2) break; From 0228af23dd6962c16682d85fdf1bab6bbf77a007 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Sun, 10 Dec 2017 23:44:11 -0500 Subject: [PATCH 0421/1013] ext4: add missing error check in __ext4_new_inode() commit 996fc4477a0ea28226b30d175f053fb6f9a4fa36 upstream. It's possible for ext4_get_acl() to return an ERR_PTR. So we need to add a check for this case in __ext4_new_inode(). Otherwise on an error we can end up oops the kernel. This was getting triggered by xfstests generic/388, which is a test which exercises the shutdown code path. Signed-off-by: Theodore Ts'o Signed-off-by: Greg Kroah-Hartman --- fs/ext4/ialloc.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c index c5f697a3fad4..207588dc803e 100644 --- a/fs/ext4/ialloc.c +++ b/fs/ext4/ialloc.c @@ -816,6 +816,8 @@ struct inode *__ext4_new_inode(handle_t *handle, struct inode *dir, #ifdef CONFIG_EXT4_FS_POSIX_ACL struct posix_acl *p = get_acl(dir, ACL_TYPE_DEFAULT); + if (IS_ERR(p)) + return ERR_CAST(p); if (p) { int acl_size = p->a_count * sizeof(ext4_acl_entry); From 38620054ea18edb165451bf24367f212960ff344 Mon Sep 17 00:00:00 2001 From: Chandan Rajendra Date: Mon, 11 Dec 2017 15:00:57 -0500 Subject: [PATCH 0422/1013] ext4: fix crash when a directory's i_size is too small commit 9d5afec6b8bd46d6ed821aa1579634437f58ef1f upstream. On a ppc64 machine, when mounting a fuzzed ext2 image (generated by fsfuzzer) the following call trace is seen, VFS: brelse: Trying to free free buffer WARNING: CPU: 1 PID: 6913 at /root/repos/linux/fs/buffer.c:1165 .__brelse.part.6+0x24/0x40 .__brelse.part.6+0x20/0x40 (unreliable) .ext4_find_entry+0x384/0x4f0 .ext4_lookup+0x84/0x250 .lookup_slow+0xdc/0x230 .walk_component+0x268/0x400 .path_lookupat+0xec/0x2d0 .filename_lookup+0x9c/0x1d0 .vfs_statx+0x98/0x140 .SyS_newfstatat+0x48/0x80 system_call+0x58/0x6c This happens because the directory that ext4_find_entry() looks up has inode->i_size that is less than the block size of the filesystem. This causes 'nblocks' to have a value of zero. ext4_bread_batch() ends up not reading any of the directory file's blocks. This renders the entries in bh_use[] array to continue to have garbage data. buffer_uptodate() on bh_use[0] can then return a zero value upon which brelse() function is invoked. This commit fixes the bug by returning -ENOENT when the directory file has no associated blocks. Reported-by: Abdul Haleem Signed-off-by: Chandan Rajendra Signed-off-by: Greg Kroah-Hartman --- fs/ext4/namei.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index bd48a8d83961..fccf295fcb03 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1399,6 +1399,10 @@ static struct buffer_head * ext4_find_entry (struct inode *dir, "falling back\n")); } nblocks = dir->i_size >> EXT4_BLOCK_SIZE_BITS(sb); + if (!nblocks) { + ret = NULL; + goto cleanup_and_exit; + } start = EXT4_I(dir)->i_dir_start_lookup; if (start >= nblocks) start = 0; From b3e3728ed55b9ddd7b1600ebad11f16aa665dfcb Mon Sep 17 00:00:00 2001 From: Guy Levi Date: Wed, 25 Oct 2017 22:39:35 +0300 Subject: [PATCH 0423/1013] IB/mlx4: Fix RSS's QPC attributes assignments [ Upstream commit 108809a0571cd1e1b317c5c083a371e163e1f8f9 ] In the modify QP handler the base_qpn_udp field in the RSS QPC is overwrite later by irrelevant value assignment. Hence, ingress packets which gets to the RSS QP will be steered then to a garbage QPN. The patch fixes this by skipping the above assignment when a RSS QP is modified, also, the RSS context's attributes assignments are relocated just before the context is posted to avoid future issues like this. Additionally, this patch takes the opportunity to change the code to be disciplined to the device's manual and assigns the RSS QP context just at RESET to INIT transition. Fixes:3078f5f1bd8b ("IB/mlx4: Add support for RSS QP") Signed-off-by: Guy Levi Reviewed-by: Yishai Hadas Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx4/qp.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/infiniband/hw/mlx4/qp.c b/drivers/infiniband/hw/mlx4/qp.c index 17e44c86577a..fcfa08747899 100644 --- a/drivers/infiniband/hw/mlx4/qp.c +++ b/drivers/infiniband/hw/mlx4/qp.c @@ -2182,11 +2182,6 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, context->flags = cpu_to_be32((to_mlx4_state(new_state) << 28) | (to_mlx4_st(dev, qp->mlx4_ib_qp_type) << 16)); - if (rwq_ind_tbl) { - fill_qp_rss_context(context, qp); - context->flags |= cpu_to_be32(1 << MLX4_RSS_QPC_FLAG_OFFSET); - } - if (!(attr_mask & IB_QP_PATH_MIG_STATE)) context->flags |= cpu_to_be32(MLX4_QP_PM_MIGRATED << 11); else { @@ -2387,6 +2382,7 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, context->pd = cpu_to_be32(pd->pdn); if (!rwq_ind_tbl) { + context->params1 = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28); get_cqs(qp, src_type, &send_cq, &recv_cq); } else { /* Set dummy CQs to be compatible with HV and PRM */ send_cq = to_mcq(rwq_ind_tbl->ind_tbl[0]->cq); @@ -2394,7 +2390,6 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, } context->cqn_send = cpu_to_be32(send_cq->mcq.cqn); context->cqn_recv = cpu_to_be32(recv_cq->mcq.cqn); - context->params1 = cpu_to_be32(MLX4_IB_ACK_REQ_FREQ << 28); /* Set "fast registration enabled" for all kernel QPs */ if (!ibuobject) @@ -2513,7 +2508,7 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, MLX4_IB_LINK_TYPE_ETH; if (dev->dev->caps.tunnel_offload_mode == MLX4_TUNNEL_OFFLOAD_MODE_VXLAN) { /* set QP to receive both tunneled & non-tunneled packets */ - if (!(context->flags & cpu_to_be32(1 << MLX4_RSS_QPC_FLAG_OFFSET))) + if (!rwq_ind_tbl) context->srqn = cpu_to_be32(7 << 28); } } @@ -2562,6 +2557,13 @@ static int __mlx4_ib_modify_qp(void *src, enum mlx4_ib_source_type src_type, } } + if (rwq_ind_tbl && + cur_state == IB_QPS_RESET && + new_state == IB_QPS_INIT) { + fill_qp_rss_context(context, qp); + context->flags |= cpu_to_be32(1 << MLX4_RSS_QPC_FLAG_OFFSET); + } + err = mlx4_qp_modify(dev->dev, &qp->mtt, to_mlx4_state(cur_state), to_mlx4_state(new_state), context, optpar, sqd_event, &qp->mqp); From dba1ca0e9a6fc57703eaa3c8d7293b8efc72c2af Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Szymanski?= Date: Fri, 10 Nov 2017 10:01:43 +0100 Subject: [PATCH 0424/1013] HID: cp2112: fix broken gpio_direction_input callback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 7da85fbf1c87d4f73621e0e7666a3387497075a9 ] When everything goes smoothly, ret is set to 0 which makes the function to return EIO error. Fixes: 8e9faa15469e ("HID: cp2112: fix gpio-callback error handling") Signed-off-by: Sébastien Szymanski Reviewed-by: Benjamin Tissoires Signed-off-by: Jiri Kosina Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/hid/hid-cp2112.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/hid/hid-cp2112.c b/drivers/hid/hid-cp2112.c index 078026f63b6f..4e940a096b2a 100644 --- a/drivers/hid/hid-cp2112.c +++ b/drivers/hid/hid-cp2112.c @@ -196,6 +196,8 @@ static int cp2112_gpio_direction_input(struct gpio_chip *chip, unsigned offset) HID_REQ_GET_REPORT); if (ret != CP2112_GPIO_CONFIG_LENGTH) { hid_err(hdev, "error requesting GPIO config: %d\n", ret); + if (ret >= 0) + ret = -EIO; goto exit; } @@ -205,8 +207,10 @@ static int cp2112_gpio_direction_input(struct gpio_chip *chip, unsigned offset) ret = hid_hw_raw_request(hdev, CP2112_GPIO_CONFIG, buf, CP2112_GPIO_CONFIG_LENGTH, HID_FEATURE_REPORT, HID_REQ_SET_REPORT); - if (ret < 0) { + if (ret != CP2112_GPIO_CONFIG_LENGTH) { hid_err(hdev, "error setting GPIO config: %d\n", ret); + if (ret >= 0) + ret = -EIO; goto exit; } @@ -214,7 +218,7 @@ static int cp2112_gpio_direction_input(struct gpio_chip *chip, unsigned offset) exit: mutex_unlock(&dev->lock); - return ret < 0 ? ret : -EIO; + return ret; } static void cp2112_gpio_set(struct gpio_chip *chip, unsigned offset, int value) From 71307dd192e7ac012894dfebd9c47cb5afc0611b Mon Sep 17 00:00:00 2001 From: Robert Stonehouse Date: Tue, 7 Nov 2017 17:30:30 +0000 Subject: [PATCH 0425/1013] sfc: don't warn on successful change of MAC [ Upstream commit cbad52e92ad7f01f0be4ca58bde59462dc1afe3a ] Fixes: 535a61777f44e ("sfc: suppress handled MCDI failures when changing the MAC address") Signed-off-by: Bert Kenward Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/sfc/ef10.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sfc/ef10.c b/drivers/net/ethernet/sfc/ef10.c index 13f72f5b18d2..09352ee43b55 100644 --- a/drivers/net/ethernet/sfc/ef10.c +++ b/drivers/net/ethernet/sfc/ef10.c @@ -5726,7 +5726,7 @@ static int efx_ef10_set_mac_address(struct efx_nic *efx) * MCFW do not support VFs. */ rc = efx_ef10_vport_set_mac_address(efx); - } else { + } else if (rc) { efx_mcdi_display_error(efx, MC_CMD_VADAPTOR_SET_MAC, sizeof(inbuf), NULL, 0, rc); } From f9a40df232448539f6586f9a0c72ca89fd694fbe Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 9 Nov 2017 18:09:33 +0100 Subject: [PATCH 0426/1013] fbdev: controlfb: Add missing modes to fix out of bounds access [ Upstream commit ac831a379d34109451b3c41a44a20ee10ecb615f ] Dan's static analysis says: drivers/video/fbdev/controlfb.c:560 control_setup() error: buffer overflow 'control_mac_modes' 20 <= 21 Indeed, control_mac_modes[] has only 20 elements, while VMODE_MAX is 22, which may lead to an out of bounds read when parsing vmode commandline options. The bug was introduced in v2.4.5.6, when 2 new modes were added to macmodes.h, but control_mac_modes[] wasn't updated: https://kernel.opensuse.org/cgit/kernel/diff/include/video/macmodes.h?h=v2.5.2&id=29f279c764808560eaceb88fef36cbc35c529aad Augment control_mac_modes[] with the two new video modes to fix this. Reported-by: Dan Carpenter Signed-off-by: Geert Uytterhoeven Cc: Dan Carpenter Cc: Benjamin Herrenschmidt Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/video/fbdev/controlfb.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/video/fbdev/controlfb.h b/drivers/video/fbdev/controlfb.h index 6026c60fc100..261522fabdac 100644 --- a/drivers/video/fbdev/controlfb.h +++ b/drivers/video/fbdev/controlfb.h @@ -141,5 +141,7 @@ static struct max_cmodes control_mac_modes[] = { {{ 1, 2}}, /* 1152x870, 75Hz */ {{ 0, 1}}, /* 1280x960, 75Hz */ {{ 0, 1}}, /* 1280x1024, 75Hz */ + {{ 1, 2}}, /* 1152x768, 60Hz */ + {{ 0, 1}}, /* 1600x1024, 60Hz */ }; From 456279d0e3ad7cb3b6b90aeb07c189c7f8590ab0 Mon Sep 17 00:00:00 2001 From: Ladislav Michl Date: Thu, 9 Nov 2017 18:09:30 +0100 Subject: [PATCH 0427/1013] video: udlfb: Fix read EDID timeout [ Upstream commit c98769475575c8a585f5b3952f4b5f90266f699b ] While usb_control_msg function expects timeout in miliseconds, a value of HZ is used. Replace it with USB_CTRL_GET_TIMEOUT and also fix error message which looks like: udlfb: Read EDID byte 78 failed err ffffff92 as error is either negative errno or number of bytes transferred use %d format specifier. Returned EDID is in second byte, so return error when less than two bytes are received. Fixes: 18dffdf8913a ("staging: udlfb: enhance EDID and mode handling support") Signed-off-by: Ladislav Michl Cc: Bernie Thompson Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/video/fbdev/udlfb.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/video/fbdev/udlfb.c b/drivers/video/fbdev/udlfb.c index ef08a104fb42..d44f14242016 100644 --- a/drivers/video/fbdev/udlfb.c +++ b/drivers/video/fbdev/udlfb.c @@ -769,11 +769,11 @@ static int dlfb_get_edid(struct dlfb_data *dev, char *edid, int len) for (i = 0; i < len; i++) { ret = usb_control_msg(dev->udev, - usb_rcvctrlpipe(dev->udev, 0), (0x02), - (0x80 | (0x02 << 5)), i << 8, 0xA1, rbuf, 2, - HZ); - if (ret < 1) { - pr_err("Read EDID byte %d failed err %x\n", i, ret); + usb_rcvctrlpipe(dev->udev, 0), 0x02, + (0x80 | (0x02 << 5)), i << 8, 0xA1, + rbuf, 2, USB_CTRL_GET_TIMEOUT); + if (ret < 2) { + pr_err("Read EDID byte %d failed: %d\n", i, ret); i--; break; } From df235d2ee36e70a43b122c0eb34064794ef4464c Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Thu, 9 Nov 2017 18:09:28 +0100 Subject: [PATCH 0428/1013] video: fbdev: au1200fb: Release some resources if a memory allocation fails [ Upstream commit 451f130602619a17c8883dd0b71b11624faffd51 ] We should go through the error handling code instead of returning -ENOMEM directly. Signed-off-by: Christophe JAILLET Cc: Tejun Heo Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/video/fbdev/au1200fb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/video/fbdev/au1200fb.c b/drivers/video/fbdev/au1200fb.c index 5f04b4096c42..99d6cfb168b5 100644 --- a/drivers/video/fbdev/au1200fb.c +++ b/drivers/video/fbdev/au1200fb.c @@ -1701,7 +1701,8 @@ static int au1200fb_drv_probe(struct platform_device *dev) if (!fbdev->fb_mem) { print_err("fail to allocate frambuffer (size: %dK))", fbdev->fb_len / 1024); - return -ENOMEM; + ret = -ENOMEM; + goto failed; } /* From d6f59ef5b840c5815430d63bb08e8dcbd9e2a132 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Thu, 9 Nov 2017 18:09:28 +0100 Subject: [PATCH 0429/1013] video: fbdev: au1200fb: Return an error code if a memory allocation fails [ Upstream commit 8cae353e6b01ac3f18097f631cdbceb5ff28c7f3 ] 'ret' is known to be 0 at this point. In case of memory allocation error in 'framebuffer_alloc()', return -ENOMEM instead. Signed-off-by: Christophe JAILLET Cc: Tejun Heo Signed-off-by: Bartlomiej Zolnierkiewicz Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/video/fbdev/au1200fb.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/video/fbdev/au1200fb.c b/drivers/video/fbdev/au1200fb.c index 99d6cfb168b5..6c542d0ca076 100644 --- a/drivers/video/fbdev/au1200fb.c +++ b/drivers/video/fbdev/au1200fb.c @@ -1681,8 +1681,10 @@ static int au1200fb_drv_probe(struct platform_device *dev) fbi = framebuffer_alloc(sizeof(struct au1200fb_device), &dev->dev); - if (!fbi) + if (!fbi) { + ret = -ENOMEM; goto failed; + } _au1200fb_infos[plane] = fbi; fbdev = fbi->par; From 28461320cc952e8a971ec415743ae97120937fbd Mon Sep 17 00:00:00 2001 From: Philipp Zabel Date: Tue, 7 Nov 2017 13:12:17 +0100 Subject: [PATCH 0430/1013] rtc: pcf8563: fix output clock rate [ Upstream commit a3350f9c57ffad569c40f7320b89da1f3061c5bb ] The pcf8563_clkout_recalc_rate function erroneously ignores the frequency index read from the CLKO register and always returns 32768 Hz. Fixes: a39a6405d5f9 ("rtc: pcf8563: add CLKOUT to common clock framework") Signed-off-by: Philipp Zabel Signed-off-by: Alexandre Belloni Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-pcf8563.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-pcf8563.c b/drivers/rtc/rtc-pcf8563.c index cea6ea4df970..8c836c51a508 100644 --- a/drivers/rtc/rtc-pcf8563.c +++ b/drivers/rtc/rtc-pcf8563.c @@ -422,7 +422,7 @@ static unsigned long pcf8563_clkout_recalc_rate(struct clk_hw *hw, return 0; buf &= PCF8563_REG_CLKO_F_MASK; - return clkout_rates[ret]; + return clkout_rates[buf]; } static long pcf8563_clkout_round_rate(struct clk_hw *hw, unsigned long rate, From 73e31f2af17843c640fd7e0c8d73f13773dd1777 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 7 Nov 2017 11:46:05 +0100 Subject: [PATCH 0431/1013] scsi: aacraid: use timespec64 instead of timeval [ Upstream commit 820f188659122602ab217dd80cfa32b3ac0c55c0 ] aacraid passes the current time to the firmware in one of two ways, either as year/month/day/... or as 32-bit unsigned seconds. The first one is broken on 32-bit architectures as it cannot go past year 2038. Using timespec64 here makes it behave properly on both 32-bit and 64-bit architectures, and avoids relying on signed integer overflow to pass times into the second interface. The interface used in aac_send_hosttime() however is still problematic in year 2106 when 32-bit seconds overflow. Hopefully we don't have to worry about aacraid by that time. Signed-off-by: Arnd Bergmann Reviewed-by: Dave Carroll Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/aacraid/commsup.c | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/scsi/aacraid/commsup.c b/drivers/scsi/aacraid/commsup.c index dfe8e70f8d99..525a652dab48 100644 --- a/drivers/scsi/aacraid/commsup.c +++ b/drivers/scsi/aacraid/commsup.c @@ -2383,19 +2383,19 @@ static int aac_send_wellness_command(struct aac_dev *dev, char *wellness_str, goto out; } -int aac_send_safw_hostttime(struct aac_dev *dev, struct timeval *now) +int aac_send_safw_hostttime(struct aac_dev *dev, struct timespec64 *now) { struct tm cur_tm; char wellness_str[] = "TD\010\0\0\0\0\0\0\0\0\0DW\0\0ZZ"; u32 datasize = sizeof(wellness_str); - unsigned long local_time; + time64_t local_time; int ret = -ENODEV; if (!dev->sa_firmware) goto out; - local_time = (u32)(now->tv_sec - (sys_tz.tz_minuteswest * 60)); - time_to_tm(local_time, 0, &cur_tm); + local_time = (now->tv_sec - (sys_tz.tz_minuteswest * 60)); + time64_to_tm(local_time, 0, &cur_tm); cur_tm.tm_mon += 1; cur_tm.tm_year += 1900; wellness_str[8] = bin2bcd(cur_tm.tm_hour); @@ -2412,7 +2412,7 @@ int aac_send_safw_hostttime(struct aac_dev *dev, struct timeval *now) return ret; } -int aac_send_hosttime(struct aac_dev *dev, struct timeval *now) +int aac_send_hosttime(struct aac_dev *dev, struct timespec64 *now) { int ret = -ENOMEM; struct fib *fibptr; @@ -2424,7 +2424,7 @@ int aac_send_hosttime(struct aac_dev *dev, struct timeval *now) aac_fib_init(fibptr); info = (__le32 *)fib_data(fibptr); - *info = cpu_to_le32(now->tv_sec); + *info = cpu_to_le32(now->tv_sec); /* overflow in y2106 */ ret = aac_fib_send(SendHostTime, fibptr, sizeof(*info), FsaNormal, 1, 1, NULL, NULL); @@ -2496,7 +2496,7 @@ int aac_command_thread(void *data) } if (!time_before(next_check_jiffies,next_jiffies) && ((difference = next_jiffies - jiffies) <= 0)) { - struct timeval now; + struct timespec64 now; int ret; /* Don't even try to talk to adapter if its sick */ @@ -2506,15 +2506,15 @@ int aac_command_thread(void *data) next_check_jiffies = jiffies + ((long)(unsigned)check_interval) * HZ; - do_gettimeofday(&now); + ktime_get_real_ts64(&now); /* Synchronize our watches */ - if (((1000000 - (1000000 / HZ)) > now.tv_usec) - && (now.tv_usec > (1000000 / HZ))) - difference = (((1000000 - now.tv_usec) * HZ) - + 500000) / 1000000; + if (((NSEC_PER_SEC - (NSEC_PER_SEC / HZ)) > now.tv_nsec) + && (now.tv_nsec > (NSEC_PER_SEC / HZ))) + difference = (((NSEC_PER_SEC - now.tv_nsec) * HZ) + + NSEC_PER_SEC / 2) / NSEC_PER_SEC; else { - if (now.tv_usec > 500000) + if (now.tv_nsec > NSEC_PER_SEC / 2) ++now.tv_sec; if (dev->sa_firmware) From c4760b9c8d06f541d37784afb3bbeaf895346b9b Mon Sep 17 00:00:00 2001 From: Pixel Ding Date: Wed, 8 Nov 2017 10:20:01 +0800 Subject: [PATCH 0432/1013] drm/amdgpu: bypass lru touch for KIQ ring submission MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit dce1e131dd4dc68099ff1b70aa03cd2d0acf8639 ] KIQ ring submission is used for register accessing on SRIOV VF that could happen both in irq enabled and irq disabled cases. Inversion lock could happen on adev->ring_lru_list_lock, while this operation is useless and just adds overhead in this use case. Signed-off-by: Pixel Ding Reviewed-by: Monk Liu Reviewed-by: Christian König Signed-off-by: Alex Deucher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 5ce65280b396..90adff83e489 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -136,7 +136,8 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring) if (ring->funcs->end_use) ring->funcs->end_use(ring); - amdgpu_ring_lru_touch(ring->adev, ring); + if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) + amdgpu_ring_lru_touch(ring->adev, ring); } /** From 32f733a8f470bf37b77da08d60868aace87f78f7 Mon Sep 17 00:00:00 2001 From: Rajat Jain Date: Tue, 31 Oct 2017 14:44:24 -0700 Subject: [PATCH 0433/1013] PM / s2idle: Clear the events_check_enabled flag [ Upstream commit 95b982b45122c57da2ee0b46cce70775e1d987af ] Problem: This flag does not get cleared currently in the suspend or resume path in the following cases: * In case some driver's suspend routine returns an error. * Successful s2idle case * etc? Why is this a problem: What happens is that the next suspend attempt could fail even though the user did not enable the flag by writing to /sys/power/wakeup_count. This is 1 use case how the issue can be seen (but similar use case with driver suspend failure can be thought of): 1. Read /sys/power/wakeup_count 2. echo count > /sys/power/wakeup_count 3. echo freeze > /sys/power/wakeup_count 4. Let the system suspend, and wakeup the system using some wake source that calls pm_wakeup_event() e.g. power button or something. 5. Note that the combined wakeup count would be incremented due to the pm_wakeup_event() in the resume path. 6. After resuming the events_check_enabled flag is still set. At this point if the user attempts to freeze again (without writing to /sys/power/wakeup_count), the suspend would fail even though there has been no wake event since the past resume. Address that by clearing the flag just before a resume is completed, so that it is always cleared for the corner cases mentioned above. Signed-off-by: Rajat Jain Acked-by: Pavel Machek Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/power/suspend.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c index ccd2d20e6b06..0685c4499431 100644 --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -437,7 +437,6 @@ static int suspend_enter(suspend_state_t state, bool *wakeup) error = suspend_ops->enter(state); trace_suspend_resume(TPS("machine_suspend"), state, false); - events_check_enabled = false; } else if (*wakeup) { error = -EBUSY; } @@ -582,6 +581,7 @@ static int enter_state(suspend_state_t state) pm_restore_gfp_mask(); Finish: + events_check_enabled = false; pm_pr_dbg("Finishing wakeup.\n"); suspend_finish(); Unlock: From b1d0ddc83e3cb2b9e5641ee52f235fcf463fd1d8 Mon Sep 17 00:00:00 2001 From: Pankaj Bharadiya Date: Tue, 7 Nov 2017 16:16:19 +0530 Subject: [PATCH 0434/1013] ASoC: Intel: Skylake: Fix uuid_module memory leak in failure case [ Upstream commit f8e066521192c7debe59127d90abbe2773577e25 ] In the loop that adds the uuid_module to the uuid_list list, allocated memory is not properly freed in the error path free uuid_list whenever any of the memory allocation in the loop fails to avoid memory leak. Signed-off-by: Pankaj Bharadiya Signed-off-by: Guneshwor Singh Acked-By: Vinod Koul Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/soc/intel/skylake/skl-sst-utils.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/sound/soc/intel/skylake/skl-sst-utils.c b/sound/soc/intel/skylake/skl-sst-utils.c index 369ef7ce981c..8ff89280d9fd 100644 --- a/sound/soc/intel/skylake/skl-sst-utils.c +++ b/sound/soc/intel/skylake/skl-sst-utils.c @@ -251,6 +251,7 @@ int snd_skl_parse_uuids(struct sst_dsp *ctx, const struct firmware *fw, struct uuid_module *module; struct firmware stripped_fw; unsigned int safe_file; + int ret = 0; /* Get the FW pointer to derive ADSP header */ stripped_fw.data = fw->data; @@ -299,8 +300,10 @@ int snd_skl_parse_uuids(struct sst_dsp *ctx, const struct firmware *fw, for (i = 0; i < num_entry; i++, mod_entry++) { module = kzalloc(sizeof(*module), GFP_KERNEL); - if (!module) - return -ENOMEM; + if (!module) { + ret = -ENOMEM; + goto free_uuid_list; + } uuid_bin = (uuid_le *)mod_entry->uuid.id; memcpy(&module->uuid, uuid_bin, sizeof(module->uuid)); @@ -311,8 +314,8 @@ int snd_skl_parse_uuids(struct sst_dsp *ctx, const struct firmware *fw, size = sizeof(int) * mod_entry->instance_max_count; module->instance_id = devm_kzalloc(ctx->dev, size, GFP_KERNEL); if (!module->instance_id) { - kfree(module); - return -ENOMEM; + ret = -ENOMEM; + goto free_uuid_list; } list_add_tail(&module->list, &skl->uuid_list); @@ -323,6 +326,10 @@ int snd_skl_parse_uuids(struct sst_dsp *ctx, const struct firmware *fw, } return 0; + +free_uuid_list: + skl_freeup_uuid_list(skl); + return ret; } void skl_freeup_uuid_list(struct skl_sst *ctx) From ac6864737c0240499f1cd0689cd61b2bad6e04c8 Mon Sep 17 00:00:00 2001 From: Peter Ujfalusi Date: Wed, 8 Nov 2017 12:02:25 +0200 Subject: [PATCH 0435/1013] dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type [ Upstream commit 288e7560e4d3e259aa28f8f58a8dfe63627a1bf6 ] The used 0x1f mask is only valid for am335x family of SoC, different family using this type of crossbar might have different number of electable events. In case of am43xx family 0x3f mask should have been used for example. Instead of trying to handle each family's mask, just use u8 type to store the mux value since the event offsets are aligned to byte offset. Fixes: 42dbdcc6bf965 ("dmaengine: ti-dma-crossbar: Add support for crossbar on AM33xx/AM43xx") Signed-off-by: Peter Ujfalusi Signed-off-by: Vinod Koul Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/dma/ti-dma-crossbar.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/dma/ti-dma-crossbar.c b/drivers/dma/ti-dma-crossbar.c index f1d04b70ee67..7df910e7c348 100644 --- a/drivers/dma/ti-dma-crossbar.c +++ b/drivers/dma/ti-dma-crossbar.c @@ -49,12 +49,12 @@ struct ti_am335x_xbar_data { struct ti_am335x_xbar_map { u16 dma_line; - u16 mux_val; + u8 mux_val; }; -static inline void ti_am335x_xbar_write(void __iomem *iomem, int event, u16 val) +static inline void ti_am335x_xbar_write(void __iomem *iomem, int event, u8 val) { - writeb_relaxed(val & 0x1f, iomem + event); + writeb_relaxed(val, iomem + event); } static void ti_am335x_xbar_free(struct device *dev, void *route_data) @@ -105,7 +105,7 @@ static void *ti_am335x_xbar_route_allocate(struct of_phandle_args *dma_spec, } map->dma_line = (u16)dma_spec->args[0]; - map->mux_val = (u16)dma_spec->args[2]; + map->mux_val = (u8)dma_spec->args[2]; dma_spec->args[2] = 0; dma_spec->args_count = 2; From 540236e28f32749f5d1711020fbcab5be3afbf7f Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Mon, 6 Nov 2017 11:11:28 +0000 Subject: [PATCH 0436/1013] mlxsw: spectrum: Fix error return code in mlxsw_sp_port_create() [ Upstream commit d86fd113ebbb37726ef7c7cc6fd6d5ce377455d6 ] Fix to return a negative error code from the VID create error handling case instead of 0, as done elsewhere in this function. Fixes: c57529e1d5d8 ("mlxsw: spectrum: Replace vPorts with Port-VLAN") Signed-off-by: Wei Yongjun Reviewed-by: Ido Schimmel Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index 696b99e65a5a..db38880f54b4 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -2974,6 +2974,7 @@ static int mlxsw_sp_port_create(struct mlxsw_sp *mlxsw_sp, u8 local_port, if (IS_ERR(mlxsw_sp_port_vlan)) { dev_err(mlxsw_sp->bus_info->dev, "Port %d: Failed to create VID 1\n", mlxsw_sp_port->local_port); + err = PTR_ERR(mlxsw_sp_port_vlan); goto err_port_vlan_get; } From 27a12c783e700be095c39abe2749a20c3ab8ea11 Mon Sep 17 00:00:00 2001 From: Qiang Date: Thu, 28 Sep 2017 11:54:34 +0800 Subject: [PATCH 0437/1013] PCI/PME: Handle invalid data when reading Root Status [ Upstream commit 3ad3f8ce50914288731a3018b27ee44ab803e170 ] PCIe PME and native hotplug share the same interrupt number, so hotplug interrupts are also processed by PME. In some cases, e.g., a Link Down interrupt, a device may be present but unreachable, so when we try to read its Root Status register, the read fails and we get all ones data (0xffffffff). Previously, we interpreted that data as PCI_EXP_RTSTA_PME being set, i.e., "some device has asserted PME," so we scheduled pcie_pme_work_fn(). This caused an infinite loop because pcie_pme_work_fn() tried to handle PME requests until PCI_EXP_RTSTA_PME is cleared, but with the link down, PCI_EXP_RTSTA_PME can't be cleared. Check for the invalid 0xffffffff data everywhere we read the Root Status register. 1469d17dd341 ("PCI: pciehp: Handle invalid data when reading from non-existent devices") added similar checks in the hotplug driver. Signed-off-by: Qiang Zheng [bhelgaas: changelog, also check in pcie_pme_work_fn(), use "~0" to follow other similar checks] Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pcie/pme.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/pme.c b/drivers/pci/pcie/pme.c index fafdb165dd2e..df290aa58dce 100644 --- a/drivers/pci/pcie/pme.c +++ b/drivers/pci/pcie/pme.c @@ -226,6 +226,9 @@ static void pcie_pme_work_fn(struct work_struct *work) break; pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta); + if (rtsta == (u32) ~0) + break; + if (rtsta & PCI_EXP_RTSTA_PME) { /* * Clear PME status of the port. If there are other @@ -273,7 +276,7 @@ static irqreturn_t pcie_pme_irq(int irq, void *context) spin_lock_irqsave(&data->lock, flags); pcie_capability_read_dword(port, PCI_EXP_RTSTA, &rtsta); - if (!(rtsta & PCI_EXP_RTSTA_PME)) { + if (rtsta == (u32) ~0 || !(rtsta & PCI_EXP_RTSTA_PME)) { spin_unlock_irqrestore(&data->lock, flags); return IRQ_NONE; } From f9565e1e078329da7c92dd6812685bf8a2dd9b8e Mon Sep 17 00:00:00 2001 From: Shriya Date: Fri, 13 Oct 2017 10:06:41 +0530 Subject: [PATCH 0438/1013] powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo [ Upstream commit cd77b5ce208c153260ed7882d8910f2395bfaabd ] The call to /proc/cpuinfo in turn calls cpufreq_quick_get() which returns the last frequency requested by the kernel, but may not reflect the actual frequency the processor is running at. This patch makes a call to cpufreq_get() instead which returns the current frequency reported by the hardware. Fixes: fb5153d05a7d ("powerpc: powernv: Implement ppc_md.get_proc_freq()") Signed-off-by: Shriya Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powernv/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index bbb73aa0eb8f..bfe2aa702973 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -319,7 +319,7 @@ static unsigned long pnv_get_proc_freq(unsigned int cpu) { unsigned long ret_freq; - ret_freq = cpufreq_quick_get(cpu) * 1000ul; + ret_freq = cpufreq_get(cpu) * 1000ul; /* * If the backend cpufreq driver does not exist, From fe1c1819b4340f53ac2da07991c810fceaf06bba Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Fri, 13 Oct 2017 21:35:43 +0300 Subject: [PATCH 0439/1013] PCI: Do not allocate more buses than available in parent [ Upstream commit a20c7f36bd3d20d245616ae223bb9d05dfb6f050 ] One can ask more buses to be reserved for hotplug bridges by passing pci=hpbussize=N in the kernel command line. If the parent bus does not have enough bus space available we incorrectly create child bus with the requested number of subordinate buses. In the example below hpbussize is set to one more than we have available buses in the root port: pci 0000:07:00.0: [8086:1578] type 01 class 0x060400 pci 0000:07:00.0: scanning [bus 00-00] behind bridge, pass 0 pci 0000:07:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring pci 0000:07:00.0: scanning [bus 00-00] behind bridge, pass 1 pci_bus 0000:08: busn_res: can not insert [bus 08-ff] under [bus 07-3f] (conflicts with (null) [bus 07-3f]) pci_bus 0000:08: scanning bus ... pci_bus 0000:0a: bus scan returning with max=40 pci_bus 0000:0a: busn_res: [bus 0a-ff] end is updated to 40 pci_bus 0000:0a: [bus 0a-40] partially hidden behind bridge 0000:07 [bus 07-3f] pci_bus 0000:08: bus scan returning with max=40 pci_bus 0000:08: busn_res: [bus 08-ff] end is updated to 40 Instead of allowing this, limit the subordinate number to be less than or equal the maximum subordinate number allocated for the parent bus (if it has any). Signed-off-by: Mika Westerberg [bhelgaas: remove irrelevant dmesg messages] Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/probe.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index ff94b69738a8..f285cd74088e 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1076,7 +1076,8 @@ int pci_scan_bridge(struct pci_bus *bus, struct pci_dev *dev, int max, int pass) child = pci_add_new_bus(bus, dev, max+1); if (!child) goto out; - pci_bus_insert_busn_res(child, max+1, 0xff); + pci_bus_insert_busn_res(child, max+1, + bus->busn_res.end); } max++; buses = (buses & 0xff000000) @@ -2433,6 +2434,10 @@ unsigned int pci_scan_child_bus(struct pci_bus *bus) if (bus->self && bus->self->is_hotplug_bridge && pci_hotplug_bus_size) { if (max - bus->busn_res.start < pci_hotplug_bus_size - 1) max = bus->busn_res.start + pci_hotplug_bus_size - 1; + + /* Do not allocate more buses than we have room left */ + if (max > bus->busn_res.end) + max = bus->busn_res.end; } /* From 837da470dc16168a4ea75989b0f902efe795dda4 Mon Sep 17 00:00:00 2001 From: Matthias Brugger Date: Mon, 30 Oct 2017 12:37:55 +0100 Subject: [PATCH 0440/1013] iommu/mediatek: Fix driver name [ Upstream commit 395df08d2e1de238a9c8c33fdcd0e2160efd63a9 ] There exist two Mediatek iommu drivers for the two different generations of the device. But both drivers have the same name "mtk-iommu". This breaks the registration of the second driver: Error: Driver 'mtk-iommu' is already registered, aborting... Fix this by changing the name for first generation to "mtk-iommu-v1". Fixes: b17336c55d89 ("iommu/mediatek: add support for mtk iommu generation one HW") Signed-off-by: Matthias Brugger Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/mtk_iommu_v1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index bc1efbfb9ddf..542930cd183d 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -708,7 +708,7 @@ static struct platform_driver mtk_iommu_driver = { .probe = mtk_iommu_probe, .remove = mtk_iommu_remove, .driver = { - .name = "mtk-iommu", + .name = "mtk-iommu-v1", .of_match_table = mtk_iommu_of_ids, .pm = &mtk_iommu_pm_ops, } From c4e7af283374329a6eca97dfa782d710417f3e19 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Sat, 4 Nov 2017 23:52:54 -0500 Subject: [PATCH 0441/1013] thunderbolt: tb: fix use after free in tb_activate_pcie_devices MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit a2e373438f72391493a4425efc1b82030b6b4fd5 ] Add a ̣̣continue statement in order to avoid using a previously free'd pointer tunnel in list_add. Addresses-Coverity-ID: 1415336 Fixes: 9d3cce0b6136 ("thunderbolt: Introduce thunderbolt bus and connection manager") Signed-off-by: Gustavo A. R. Silva Acked-by: Mika Westerberg Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/thunderbolt/tb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index d674e06767a5..1424581fd9af 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -225,6 +225,7 @@ static void tb_activate_pcie_devices(struct tb *tb) tb_port_info(up_port, "PCIe tunnel activation failed, aborting\n"); tb_pci_free(tunnel); + continue; } list_add(&tunnel->list, &tcm->tunnel_list); From d815f4efbb2a04e57632fc4daa6ebaf5fca52123 Mon Sep 17 00:00:00 2001 From: KUWAZAWA Takuya Date: Sun, 15 Oct 2017 20:54:10 +0900 Subject: [PATCH 0442/1013] netfilter: ipvs: Fix inappropriate output of procfs [ Upstream commit c5504f724c86ee925e7ffb80aa342cfd57959b13 ] Information about ipvs in different network namespace can be seen via procfs. How to reproduce: # ip netns add ns01 # ip netns add ns02 # ip netns exec ns01 ip a add dev lo 127.0.0.1/8 # ip netns exec ns02 ip a add dev lo 127.0.0.1/8 # ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80 # ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80 The ipvsadm displays information about its own network namespace only. # ip netns exec ns01 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.1:80 wlc # ip netns exec ns02 ipvsadm -Ln IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.1.1.2:80 wlc But I can see information about other network namespace via procfs. # ip netns exec ns01 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010101:0050 wlc TCP 0A010102:0050 wlc # ip netns exec ns02 cat /proc/net/ip_vs IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 0A010102:0050 wlc Signed-off-by: KUWAZAWA Takuya Acked-by: Julian Anastasov Signed-off-by: Pablo Neira Ayuso Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/netfilter/ipvs/ip_vs_ctl.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c index 4f940d7eb2f7..b3245f9a37d1 100644 --- a/net/netfilter/ipvs/ip_vs_ctl.c +++ b/net/netfilter/ipvs/ip_vs_ctl.c @@ -2034,12 +2034,16 @@ static int ip_vs_info_seq_show(struct seq_file *seq, void *v) seq_puts(seq, " -> RemoteAddress:Port Forward Weight ActiveConn InActConn\n"); } else { + struct net *net = seq_file_net(seq); + struct netns_ipvs *ipvs = net_ipvs(net); const struct ip_vs_service *svc = v; const struct ip_vs_iter *iter = seq->private; const struct ip_vs_dest *dest; struct ip_vs_scheduler *sched = rcu_dereference(svc->scheduler); char *sched_name = sched ? sched->name : "none"; + if (svc->ipvs != ipvs) + return 0; if (iter->table == ip_vs_svc_table) { #ifdef CONFIG_IP_VS_IPV6 if (svc->af == AF_INET6) From 2637cb1e3d8086cd31a96e60c3e7e0aa74c043f4 Mon Sep 17 00:00:00 2001 From: "William A. Kennington III" Date: Fri, 22 Sep 2017 16:58:00 -0700 Subject: [PATCH 0443/1013] powerpc/opal: Fix EBUSY bug in acquiring tokens [ Upstream commit 71e24d7731a2903b1ae2bba2b2971c654d9c2aa6 ] The current code checks the completion map to look for the first token that is complete. In some cases, a completion can come in but the token can still be on lease to the caller processing the completion. If this completed but unreleased token is the first token found in the bitmap by another tasks trying to acquire a token, then the __test_and_set_bit call will fail since the token will still be on lease. The acquisition will then fail with an EBUSY. This patch reorganizes the acquisition code to look at the opal_async_token_map for an unleased token. If the token has no lease it must have no outstanding completions so we should never see an EBUSY, unless we have leased out too many tokens. Since opal_async_get_token_inrerruptible is protected by a semaphore, we will practically never see EBUSY anymore. Fixes: 8d7248232208 ("powerpc/powernv: Infrastructure to support OPAL async completion") Signed-off-by: William A. Kennington III Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/powernv/opal-async.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/platforms/powernv/opal-async.c b/arch/powerpc/platforms/powernv/opal-async.c index cf33769a7b72..45b3feb8aa2f 100644 --- a/arch/powerpc/platforms/powernv/opal-async.c +++ b/arch/powerpc/platforms/powernv/opal-async.c @@ -39,18 +39,18 @@ int __opal_async_get_token(void) int token; spin_lock_irqsave(&opal_async_comp_lock, flags); - token = find_first_bit(opal_async_complete_map, opal_max_async_tokens); + token = find_first_zero_bit(opal_async_token_map, opal_max_async_tokens); if (token >= opal_max_async_tokens) { token = -EBUSY; goto out; } - if (__test_and_set_bit(token, opal_async_token_map)) { + if (!__test_and_clear_bit(token, opal_async_complete_map)) { token = -EBUSY; goto out; } - __clear_bit(token, opal_async_complete_map); + __set_bit(token, opal_async_token_map); out: spin_unlock_irqrestore(&opal_async_comp_lock, flags); From 0fe5286e49d4e1fe7eba347302b3d3dee217eade Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 18 Oct 2017 11:16:47 +0200 Subject: [PATCH 0444/1013] powerpc/ipic: Fix status get and status clear [ Upstream commit 6b148a7ce72a7f87c81cbcde48af014abc0516a9 ] IPIC Status is provided by register IPIC_SERSR and not by IPIC_SERMR which is the mask register. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/sysdev/ipic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/sysdev/ipic.c b/arch/powerpc/sysdev/ipic.c index 16f1edd78c40..535cf1f6941c 100644 --- a/arch/powerpc/sysdev/ipic.c +++ b/arch/powerpc/sysdev/ipic.c @@ -846,12 +846,12 @@ void ipic_disable_mcp(enum ipic_mcp_irq mcp_irq) u32 ipic_get_mcp_status(void) { - return ipic_read(primary_ipic->regs, IPIC_SERMR); + return ipic_read(primary_ipic->regs, IPIC_SERSR); } void ipic_clear_mcp_status(u32 mask) { - ipic_write(primary_ipic->regs, IPIC_SERMR, mask); + ipic_write(primary_ipic->regs, IPIC_SERSR, mask); } /* Return an interrupt vector or 0 if no interrupt is pending. */ From f8b31d88a698909f0bcc4142e4637e2baee5c320 Mon Sep 17 00:00:00 2001 From: Tyrel Datwyler Date: Thu, 28 Sep 2017 20:19:20 -0400 Subject: [PATCH 0445/1013] powerpc/pseries/vio: Dispose of virq mapping on vdevice unregister [ Upstream commit b8f89fea599d91e674497aad572613eb63181f31 ] When a vdevice is DLPAR removed from the system the vio subsystem doesn't bother unmapping the virq from the irq_domain. As a result we have a virq mapped to a hardware irq that is no longer valid for the irq_domain. A side effect is that we are left with /proc/irq/ affinity entries, and attempts to modify the smp_affinity of the irq will fail. In the following observed example the kernel log is spammed by ics_rtas_set_affinity errors after the removal of a VSCSI adapter. This is a result of irqbalance trying to adjust the affinity every 10 seconds. rpadlpar_io: slot U8408.E8E.10A7ACV-V5-C25 removed ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3 ics_rtas_set_affinity: ibm,set-xive irq=655385 returns -3 This patch fixes the issue by calling irq_dispose_mapping() on the virq of the viodev on unregister. Fixes: f2ab6219969f ("powerpc/pseries: Add PFO support to the VIO bus") Signed-off-by: Tyrel Datwyler Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/platforms/pseries/vio.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/powerpc/platforms/pseries/vio.c b/arch/powerpc/platforms/pseries/vio.c index 12277bc9fd9e..d86938260a86 100644 --- a/arch/powerpc/platforms/pseries/vio.c +++ b/arch/powerpc/platforms/pseries/vio.c @@ -1592,6 +1592,8 @@ ATTRIBUTE_GROUPS(vio_dev); void vio_unregister_device(struct vio_dev *viodev) { device_unregister(&viodev->dev); + if (viodev->family == VDEVICE) + irq_dispose_mapping(viodev->irq); } EXPORT_SYMBOL(vio_unregister_device); From c90db58f5bb6d5ba203815e28cb5e7eb20b86b27 Mon Sep 17 00:00:00 2001 From: Kuppuswamy Sathyanarayanan Date: Sun, 29 Oct 2017 02:49:54 -0700 Subject: [PATCH 0446/1013] platform/x86: intel_punit_ipc: Fix resource ioremap warning [ Upstream commit 6cc8cbbc8868033f279b63e98b26b75eaa0006ab ] For PUNIT device, ISPDRIVER_IPC and GTDDRIVER_IPC resources are not mandatory. So when PMC IPC driver creates a PUNIT device, if these resources are not available then it creates dummy resource entries for these missing resources. But during PUNIT device probe, doing ioremap on these dummy resources generates following warning messages. intel_punit_ipc: can't request region for resource [mem 0x00000000] intel_punit_ipc: can't request region for resource [mem 0x00000000] intel_punit_ipc: can't request region for resource [mem 0x00000000] intel_punit_ipc: can't request region for resource [mem 0x00000000] This patch fixes this issue by adding extra check for resource size before performing ioremap operation. Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Andy Shevchenko Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/platform/x86/intel_punit_ipc.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/platform/x86/intel_punit_ipc.c b/drivers/platform/x86/intel_punit_ipc.c index a47a41fc10ad..b5b890127479 100644 --- a/drivers/platform/x86/intel_punit_ipc.c +++ b/drivers/platform/x86/intel_punit_ipc.c @@ -252,28 +252,28 @@ static int intel_punit_get_bars(struct platform_device *pdev) * - GTDRIVER_IPC BASE_IFACE */ res = platform_get_resource(pdev, IORESOURCE_MEM, 2); - if (res) { + if (res && resource_size(res) > 1) { addr = devm_ioremap_resource(&pdev->dev, res); if (!IS_ERR(addr)) punit_ipcdev->base[ISPDRIVER_IPC][BASE_DATA] = addr; } res = platform_get_resource(pdev, IORESOURCE_MEM, 3); - if (res) { + if (res && resource_size(res) > 1) { addr = devm_ioremap_resource(&pdev->dev, res); if (!IS_ERR(addr)) punit_ipcdev->base[ISPDRIVER_IPC][BASE_IFACE] = addr; } res = platform_get_resource(pdev, IORESOURCE_MEM, 4); - if (res) { + if (res && resource_size(res) > 1) { addr = devm_ioremap_resource(&pdev->dev, res); if (!IS_ERR(addr)) punit_ipcdev->base[GTDRIVER_IPC][BASE_DATA] = addr; } res = platform_get_resource(pdev, IORESOURCE_MEM, 5); - if (res) { + if (res && resource_size(res) > 1) { addr = devm_ioremap_resource(&pdev->dev, res); if (!IS_ERR(addr)) punit_ipcdev->base[GTDRIVER_IPC][BASE_IFACE] = addr; From 96d275b841efc6decf61afafc1703f0e9b92253d Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 31 Oct 2017 11:03:18 -0700 Subject: [PATCH 0447/1013] target/iscsi: Detect conn_cmd_list corruption early [ Upstream commit 6eaf69e4ec075f5af236c0c89f75639a195db904 ] Certain behavior of the initiator can cause the target driver to send both a reject and a SCSI response. If that happens two target_put_sess_cmd() calls will occur without the command having been removed from conn_cmd_list. In other words, conn_cmd_list will get corrupted once the freed memory is reused. Although the Linux kernel can detect list corruption if list debugging is enabled, in this case the context in which list corruption is detected is not related to the context that caused list corruption. Hence add WARN_ON() statements that report the context that is causing list corruption. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Mike Christie Reviewed-by: Hannes Reinecke Signed-off-by: Nicholas Bellinger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/target/iscsi/iscsi_target_util.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/target/iscsi/iscsi_target_util.c b/drivers/target/iscsi/iscsi_target_util.c index 1e36f83b5961..70c6b9bfc04e 100644 --- a/drivers/target/iscsi/iscsi_target_util.c +++ b/drivers/target/iscsi/iscsi_target_util.c @@ -694,6 +694,8 @@ void iscsit_release_cmd(struct iscsi_cmd *cmd) struct iscsi_session *sess; struct se_cmd *se_cmd = &cmd->se_cmd; + WARN_ON(!list_empty(&cmd->i_conn_node)); + if (cmd->conn) sess = cmd->conn->sess; else @@ -716,6 +718,8 @@ void __iscsit_free_cmd(struct iscsi_cmd *cmd, bool check_queues) { struct iscsi_conn *conn = cmd->conn; + WARN_ON(!list_empty(&cmd->i_conn_node)); + if (cmd->data_direction == DMA_TO_DEVICE) { iscsit_stop_dataout_timer(cmd); iscsit_free_r2ts_from_list(cmd); From 68d702c6689d83000060118f940e0adce88e42aa Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 31 Oct 2017 11:03:17 -0700 Subject: [PATCH 0448/1013] target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd() [ Upstream commit cfe2b621bb18d86e93271febf8c6e37622da2d14 ] Avoid that cmd->se_cmd.se_tfo is read after a command has already been freed. Signed-off-by: Bart Van Assche Cc: Christoph Hellwig Cc: Mike Christie Reviewed-by: Hannes Reinecke Signed-off-by: Nicholas Bellinger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/target/iscsi/iscsi_target.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index d9ba4ee2c62b..52fa52c20be0 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -834,6 +834,7 @@ static int iscsit_add_reject_from_cmd( unsigned char *buf) { struct iscsi_conn *conn; + const bool do_put = cmd->se_cmd.se_tfo != NULL; if (!cmd->conn) { pr_err("cmd->conn is NULL for ITT: 0x%08x\n", @@ -864,7 +865,7 @@ static int iscsit_add_reject_from_cmd( * Perform the kref_put now if se_cmd has already been setup by * scsit_setup_scsi_cmd() */ - if (cmd->se_cmd.se_tfo != NULL) { + if (do_put) { pr_debug("iscsi reject: calling target_put_sess_cmd >>>>>>\n"); target_put_sess_cmd(&cmd->se_cmd); } From e389507ad355226019147985c29a5b4282aa1e4d Mon Sep 17 00:00:00 2001 From: tangwenji Date: Fri, 15 Sep 2017 16:03:13 +0800 Subject: [PATCH 0449/1013] iscsi-target: fix memory leak in lio_target_tiqn_addtpg() [ Upstream commit 12d5a43b2dffb6cd28062b4e19024f7982393288 ] tpg must free when call core_tpg_register() return fail Signed-off-by: tangwenji Signed-off-by: Nicholas Bellinger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/target/iscsi/iscsi_target_configfs.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/target/iscsi/iscsi_target_configfs.c b/drivers/target/iscsi/iscsi_target_configfs.c index 0dd4c45f7575..0ebc4818e132 100644 --- a/drivers/target/iscsi/iscsi_target_configfs.c +++ b/drivers/target/iscsi/iscsi_target_configfs.c @@ -1123,7 +1123,7 @@ static struct se_portal_group *lio_target_tiqn_addtpg( ret = core_tpg_register(wwn, &tpg->tpg_se_tpg, SCSI_PROTOCOL_ISCSI); if (ret < 0) - return NULL; + goto free_out; ret = iscsit_tpg_add_portal_group(tiqn, tpg); if (ret != 0) @@ -1135,6 +1135,7 @@ static struct se_portal_group *lio_target_tiqn_addtpg( return &tpg->tpg_se_tpg; out: core_tpg_deregister(&tpg->tpg_se_tpg); +free_out: kfree(tpg); return NULL; } From 23d3f106f88ca21fd2911ae2f3a4f8aa7b56fddc Mon Sep 17 00:00:00 2001 From: tangwenji Date: Thu, 24 Aug 2017 19:59:37 +0800 Subject: [PATCH 0450/1013] target:fix condition return in core_pr_dump_initiator_port() [ Upstream commit 24528f089d0a444070aa4f715ace537e8d6bf168 ] When is pr_reg->isid_present_at_reg is false,this function should return. This fixes a regression originally introduced by: commit d2843c173ee53cf4c12e7dfedc069a5bc76f0ac5 Author: Andy Grover Date: Thu May 16 10:40:55 2013 -0700 target: Alter core_pr_dump_initiator_port for ease of use Signed-off-by: tangwenji Signed-off-by: Nicholas Bellinger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/target/target_core_pr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/target/target_core_pr.c b/drivers/target/target_core_pr.c index 9f25c9c6f67d..4ba5004a069e 100644 --- a/drivers/target/target_core_pr.c +++ b/drivers/target/target_core_pr.c @@ -58,8 +58,10 @@ void core_pr_dump_initiator_port( char *buf, u32 size) { - if (!pr_reg->isid_present_at_reg) + if (!pr_reg->isid_present_at_reg) { buf[0] = '\0'; + return; + } snprintf(buf, size, ",i,0x%s", pr_reg->pr_reg_isid); } From f6a341e87c656f1e2e9959cfdd2d02561724558f Mon Sep 17 00:00:00 2001 From: Jiang Yi Date: Fri, 11 Aug 2017 11:29:44 +0800 Subject: [PATCH 0451/1013] target/file: Do not return error for UNMAP if length is zero [ Upstream commit 594e25e73440863981032d76c9b1e33409ceff6e ] The function fd_execute_unmap() in target_core_file.c calles ret = file->f_op->fallocate(file, mode, pos, len); Some filesystems implement fallocate() to return error if length is zero (e.g. btrfs) but according to SCSI Block Commands spec UNMAP should return success for zero length. Signed-off-by: Jiang Yi Signed-off-by: Nicholas Bellinger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/target/target_core_file.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/target/target_core_file.c b/drivers/target/target_core_file.c index c629817a8854..9b2c0c773022 100644 --- a/drivers/target/target_core_file.c +++ b/drivers/target/target_core_file.c @@ -482,6 +482,10 @@ fd_execute_unmap(struct se_cmd *cmd, sector_t lba, sector_t nolb) struct inode *inode = file->f_mapping->host; int ret; + if (!nolb) { + return 0; + } + if (cmd->se_dev->dev_attrib.pi_prot_type) { ret = fd_do_prot_unmap(cmd, lba, nolb); if (ret) From ddf2588a05c820b305bc6db2fe6e498a3bc6fde9 Mon Sep 17 00:00:00 2001 From: Liu Bo Date: Fri, 3 Nov 2017 11:24:44 -0600 Subject: [PATCH 0452/1013] badblocks: fix wrong return value in badblocks_set if badblocks are disabled [ Upstream commit 39b4954c0a1556f8f7f1fdcf59a227117fcd8a0b ] MD's rdev_set_badblocks() expects that badblocks_set() returns 1 if badblocks are disabled, otherwise, rdev_set_badblocks() will record superblock changes and return success in that case and md will fail to report an IO error which it should. This bug has existed since badblocks were introduced in commit 9e0e252a048b ("badblocks: Add core badblock management code"). Signed-off-by: Liu Bo Acked-by: Guoqing Jiang Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- block/badblocks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/block/badblocks.c b/block/badblocks.c index 43c71166e1e2..91f7bcf979d3 100644 --- a/block/badblocks.c +++ b/block/badblocks.c @@ -178,7 +178,7 @@ int badblocks_set(struct badblocks *bb, sector_t s, int sectors, if (bb->shift < 0) /* badblocks are disabled */ - return 0; + return 1; if (bb->shift) { /* round the start down, and the end up */ From 07a93b48a0f73e0cb74355a7dd80c6ad55bed674 Mon Sep 17 00:00:00 2001 From: Gary R Hook Date: Fri, 3 Nov 2017 10:50:34 -0600 Subject: [PATCH 0453/1013] iommu/amd: Limit the IOVA page range to the specified addresses [ Upstream commit b92b4fb5c14257c0e7eae291ecc1f7b1962e1699 ] The extent of pages specified when applying a reserved region should include up to the last page of the range, but not the page following the range. Signed-off-by: Gary R Hook Fixes: 8d54d6c8b8f3 ('iommu/amd: Implement apply_dm_region call-back') Signed-off-by: Alex Williamson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/amd_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index 8e8874d23717..99a2a57b6cfd 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3155,7 +3155,7 @@ static void amd_iommu_apply_resv_region(struct device *dev, unsigned long start, end; start = IOVA_PFN(region->start); - end = IOVA_PFN(region->start + region->length); + end = IOVA_PFN(region->start + region->length - 1); WARN_ON_ONCE(reserve_iova(&dma_dom->iovad, start, end) == NULL); } From 79f41e0f8ae0e3298bfcfd8c085b5242add83e9b Mon Sep 17 00:00:00 2001 From: Eryu Guan Date: Wed, 1 Nov 2017 21:43:50 -0700 Subject: [PATCH 0454/1013] xfs: truncate pagecache before writeback in xfs_setattr_size() [ Upstream commit 350976ae21873b0d36584ea005076356431b8f79 ] On truncate down, if new size is not block size aligned, we zero the rest of block to avoid exposing stale data to user, and iomap_truncate_page() skips zeroing if the range is already in unwritten state or a hole. Then we writeback from on-disk i_size to the new size if this range hasn't been written to disk yet, and truncate page cache beyond new EOF and set in-core i_size. The problem is that we could write data between di_size and newsize before removing the page cache beyond newsize, as the extents may still be in unwritten state right after a buffer write. As such, the page of data that newsize lies in has not been zeroed by page cache invalidation before it is written, and xfs_do_writepage() hasn't triggered it's "zero data beyond EOF" case because we haven't updated in-core i_size yet. Then a subsequent mmap read could see non-zeros past EOF. I occasionally see this in fsx runs in fstests generic/112, a simplified fsx operation sequence is like (assuming 4k block size xfs): fallocate 0x0 0x1000 0x0 keep_size write 0x0 0x1000 0x0 truncate 0x0 0x800 0x1000 punch_hole 0x0 0x800 0x800 mapread 0x0 0x800 0x800 where fallocate allocates unwritten extent but doesn't update i_size, buffer write populates the page cache and extent is still unwritten, truncate skips zeroing page past new EOF and writes the page to disk, punch_hole invalidates the page cache, at last mapread reads the block back and sees non-zero beyond EOF. Fix it by moving truncate_setsize() to before writeback so the page cache invalidation zeros the partial page at the new EOF. This also triggers "zero data beyond EOF" in xfs_do_writepage() at writeback time, because newsize has been set and page straddles the newsize. Also fixed the wrong 'end' param of filemap_write_and_wait_range() call while we're at it, the 'end' is inclusive and should be 'newsize - 1'. Suggested-by: Dave Chinner Signed-off-by: Eryu Guan Acked-by: Dave Chinner Reviewed-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/xfs/xfs_iops.c | 36 ++++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 17081c77ef86..f24e5b6cfc86 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -885,22 +885,6 @@ xfs_setattr_size( if (error) return error; - /* - * We are going to log the inode size change in this transaction so - * any previous writes that are beyond the on disk EOF and the new - * EOF that have not been written out need to be written here. If we - * do not write the data out, we expose ourselves to the null files - * problem. Note that this includes any block zeroing we did above; - * otherwise those blocks may not be zeroed after a crash. - */ - if (did_zeroing || - (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) { - error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, - ip->i_d.di_size, newsize); - if (error) - return error; - } - /* * We've already locked out new page faults, so now we can safely remove * pages from the page cache knowing they won't get refaulted until we @@ -917,9 +901,29 @@ xfs_setattr_size( * user visible changes). There's not much we can do about this, except * to hope that the caller sees ENOMEM and retries the truncate * operation. + * + * And we update in-core i_size and truncate page cache beyond newsize + * before writeback the [di_size, newsize] range, so we're guaranteed + * not to write stale data past the new EOF on truncate down. */ truncate_setsize(inode, newsize); + /* + * We are going to log the inode size change in this transaction so + * any previous writes that are beyond the on disk EOF and the new + * EOF that have not been written out need to be written here. If we + * do not write the data out, we expose ourselves to the null files + * problem. Note that this includes any block zeroing we did above; + * otherwise those blocks may not be zeroed after a crash. + */ + if (did_zeroing || + (newsize > ip->i_d.di_size && oldsize != ip->i_d.di_size)) { + error = filemap_write_and_wait_range(VFS_I(ip)->i_mapping, + ip->i_d.di_size, newsize - 1); + if (error) + return error; + } + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); if (error) return error; From ade07fa32f5ab083d558199e906d612b5d535553 Mon Sep 17 00:00:00 2001 From: Suzuki K Poulose Date: Fri, 3 Nov 2017 11:45:18 +0000 Subject: [PATCH 0455/1013] arm-ccn: perf: Prevent module unload while PMU is in use [ Upstream commit c7f5828bf77dcbd61d51f4736c1d5aa35663fbb4 ] When the PMU driver is built as a module, the perf expects the pmu->module to be valid, so that the driver is prevented from being unloaded while it is in use. Fix the CCN pmu driver to fill in this field. Fixes: a33b0daab73a0 ("bus: ARM CCN PMU driver") Cc: Pawel Moll Cc: Will Deacon Acked-by: Mark Rutland Signed-off-by: Suzuki K Poulose Signed-off-by: Will Deacon Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/bus/arm-ccn.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bus/arm-ccn.c b/drivers/bus/arm-ccn.c index 03d7faf51c2b..72fd1750134d 100644 --- a/drivers/bus/arm-ccn.c +++ b/drivers/bus/arm-ccn.c @@ -1280,6 +1280,7 @@ static int arm_ccn_pmu_init(struct arm_ccn *ccn) /* Perf driver registration */ ccn->dt.pmu = (struct pmu) { + .module = THIS_MODULE, .attr_groups = arm_ccn_pmu_attr_groups, .task_ctx_nr = perf_invalid_context, .event_init = arm_ccn_pmu_event_init, From 6479a108b35857674ca2a74e049e3089d12ee444 Mon Sep 17 00:00:00 2001 From: Robert Baronescu Date: Tue, 10 Oct 2017 13:22:00 +0300 Subject: [PATCH 0456/1013] crypto: tcrypt - fix buffer lengths in test_aead_speed() [ Upstream commit 7aacbfcb331ceff3ac43096d563a1f93ed46e35e ] Fix the way the length of the buffers used for encryption / decryption are computed. For e.g. in case of encryption, input buffer does not contain an authentication tag. Signed-off-by: Robert Baronescu Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- crypto/tcrypt.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c index 0022a18d36ee..f5f58a6eee5d 100644 --- a/crypto/tcrypt.c +++ b/crypto/tcrypt.c @@ -340,7 +340,7 @@ static void test_aead_speed(const char *algo, int enc, unsigned int secs, } sg_init_aead(sg, xbuf, - *b_size + (enc ? authsize : 0)); + *b_size + (enc ? 0 : authsize)); sg_init_aead(sgout, xoutbuf, *b_size + (enc ? authsize : 0)); @@ -348,7 +348,9 @@ static void test_aead_speed(const char *algo, int enc, unsigned int secs, sg_set_buf(&sg[0], assoc, aad_size); sg_set_buf(&sgout[0], assoc, aad_size); - aead_request_set_crypt(req, sg, sgout, *b_size, iv); + aead_request_set_crypt(req, sg, sgout, + *b_size + (enc ? 0 : authsize), + iv); aead_request_set_ad(req, aad_size); if (secs) From f58a90e027d8ff3f24e20429500f0923663afbd4 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Fri, 3 Nov 2017 12:21:21 +0100 Subject: [PATCH 0457/1013] mm: Handle 0 flags in _calc_vm_trans() macro [ Upstream commit 592e254502041f953e84d091eae2c68cba04c10b ] _calc_vm_trans() does not handle the situation when some of the passed flags are 0 (which can happen if these VM flags do not make sense for the architecture). Improve the _calc_vm_trans() macro to return 0 in such situation. Since all passed flags are constant, this does not add any runtime overhead. Signed-off-by: Jan Kara Signed-off-by: Dan Williams Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/linux/mman.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/mman.h b/include/linux/mman.h index 7c87b6652244..835a6f2fd8d4 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -64,8 +64,9 @@ static inline bool arch_validate_prot(unsigned long prot) * ("bit1" and "bit2" must be single bits) */ #define _calc_vm_trans(x, bit1, bit2) \ + ((!(bit1) || !(bit2)) ? 0 : \ ((bit1) <= (bit2) ? ((x) & (bit1)) * ((bit2) / (bit1)) \ - : ((x) & (bit1)) / ((bit1) / (bit2))) + : ((x) & (bit1)) / ((bit1) / (bit2)))) /* * Combine the mmap "prot" argument into "vm_flags" used internally. From 62785073959347a14eecb395e810a97e745996f2 Mon Sep 17 00:00:00 2001 From: Fuyun Liang Date: Fri, 3 Nov 2017 12:18:26 +0800 Subject: [PATCH 0458/1013] net: hns3: fix for getting advertised_caps in hns3_get_link_ksettings [ Upstream commit 2b39cabb2a283cea0c3d96d9370575371726164f ] This patch fixes a bug for ethtool's get_link_ksettings(). The advertising for autoneg is always added to advertised_caps whether autoneg is enable or disable. This patch fixes it. Fixes: 496d03e (net: hns3: Add Ethtool support to HNS3 driver) Signed-off-by: Fuyun Liang Signed-off-by: Lipeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c index d636399232fb..e590d96e434a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_ethtool.c @@ -375,6 +375,9 @@ static int hns3_get_link_ksettings(struct net_device *netdev, break; } + if (!cmd->base.autoneg) + advertised_caps &= ~HNS3_LM_AUTONEG_BIT; + /* now, map driver link modes to ethtool link modes */ hns3_driv_to_eth_caps(supported_caps, cmd, false); hns3_driv_to_eth_caps(advertised_caps, cmd, true); From 7018a57cf7018dca6469d73b485d4a35cea97649 Mon Sep 17 00:00:00 2001 From: qumingguang Date: Thu, 2 Nov 2017 20:45:22 +0800 Subject: [PATCH 0459/1013] net: hns3: Fix a misuse to devm_free_irq [ Upstream commit ae064e6123f89f90af7e4ea190cc0c612643ca93 ] we should use free_irq to free the nic irq during the unloading time. because we use request_irq to apply it when nic up. It will crash if up net device after reset the port. This patch fixes the issue. Signed-off-by: qumingguang Signed-off-by: Lipeng Signed-off-by: Yunsheng Lin Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c index 35369e1c8036..981fc62ce043 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c @@ -2460,9 +2460,8 @@ static int hns3_nic_uninit_vector_data(struct hns3_nic_priv *priv) (void)irq_set_affinity_hint( priv->tqp_vector[i].vector_irq, NULL); - devm_free_irq(&pdev->dev, - priv->tqp_vector[i].vector_irq, - &priv->tqp_vector[i]); + free_irq(priv->tqp_vector[i].vector_irq, + &priv->tqp_vector[i]); } priv->ring_data[i].ring->irq_init_flag = HNS3_VECTOR_NOT_INITED; From 3c5fed838b473e565e44ada69ff51e07317e2c8a Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Thu, 2 Nov 2017 10:30:11 +0100 Subject: [PATCH 0460/1013] staging: rtl8188eu: Revert part of "staging: rtl8188eu: fix comments with lines over 80 characters" [ Upstream commit 4004a9870bbefdb6644c3d2033f5315920a3b669 ] Commit 74e1e498e84e ("staging: rtl8188eu: fix comments with lines over 80 characters") not only changed comments but also changed an if check: -if (pmlmepriv->cur_network.join_res != true) { +if (!(pmlmepriv->cur_network.join_res)) { This is not equivalent as join_res is an int and can have values such as -2 and -3. Note for the next time, please only make one type of changes in a single clean-up commit. Fixes: 74e1e498e84e ("staging: rtl8188eu: fix comments with lines over 80 ...") Cc: Juliana Rodrigues Signed-off-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_ap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/rtl8188eu/core/rtw_ap.c b/drivers/staging/rtl8188eu/core/rtw_ap.c index 32a483769975..fa611455109a 100644 --- a/drivers/staging/rtl8188eu/core/rtw_ap.c +++ b/drivers/staging/rtl8188eu/core/rtw_ap.c @@ -754,7 +754,7 @@ static void start_bss_network(struct adapter *padapter, u8 *pbuf) } /* setting only at first time */ - if (!(pmlmepriv->cur_network.join_res)) { + if (pmlmepriv->cur_network.join_res != true) { /* WEP Key will be set before this function, do not * clear CAM. */ From 3c38ce8767c6203bf3d479a570ece2a38502ac6b Mon Sep 17 00:00:00 2001 From: Chen Zhong Date: Thu, 5 Oct 2017 11:50:23 +0800 Subject: [PATCH 0461/1013] clk: mediatek: add the option for determining PLL source clock [ Upstream commit c955bf3998efa3355790a4d8c82874582f1bc727 ] Since the previous setup always sets the PLL using crystal 26MHz, this doesn't always happen in every MediaTek platform. So the patch added flexibility for assigning extra member for determining the PLL source clock. Signed-off-by: Chen Zhong Signed-off-by: Sean Wang Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/mediatek/clk-mtk.h | 1 + drivers/clk/mediatek/clk-pll.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/clk/mediatek/clk-mtk.h b/drivers/clk/mediatek/clk-mtk.h index f5d6b70ce189..210ce8e8025e 100644 --- a/drivers/clk/mediatek/clk-mtk.h +++ b/drivers/clk/mediatek/clk-mtk.h @@ -216,6 +216,7 @@ struct mtk_pll_data { uint32_t pcw_reg; int pcw_shift; const struct mtk_pll_div_table *div_table; + const char *parent_name; }; void mtk_clk_register_plls(struct device_node *node, diff --git a/drivers/clk/mediatek/clk-pll.c b/drivers/clk/mediatek/clk-pll.c index a409142e9346..7598477ff60f 100644 --- a/drivers/clk/mediatek/clk-pll.c +++ b/drivers/clk/mediatek/clk-pll.c @@ -303,7 +303,10 @@ static struct clk *mtk_clk_register_pll(const struct mtk_pll_data *data, init.name = data->name; init.flags = (data->flags & PLL_AO) ? CLK_IS_CRITICAL : 0; init.ops = &mtk_pll_ops; - init.parent_names = &parent_name; + if (data->parent_name) + init.parent_names = &data->parent_name; + else + init.parent_names = &parent_name; init.num_parents = 1; clk = clk_register(NULL, &pll->hw); From 27731e18a571b07c4042d83679184c78957bac1a Mon Sep 17 00:00:00 2001 From: Adriana Reus Date: Mon, 2 Oct 2017 13:32:10 +0300 Subject: [PATCH 0462/1013] clk: imx: imx7d: Fix parent clock for OCRAM_CLK [ Upstream commit edc5a8e754aba9c6eaeddd18cb1e72462f99b16c ] The parent of OCRAM_CLK should be axi_main_root_clk and not axi_post_div. before: axi_src 1 1 332307692 0 0 axi_cg 1 1 332307692 0 0 axi_pre_div 1 1 332307692 0 0 axi_post_div 1 1 332307692 0 0 ocram_clk 0 0 332307692 0 0 main_axi_root_clk 1 1 332307692 0 0 after: axi_src 1 1 332307692 0 0 axi_cg 1 1 332307692 0 0 axi_pre_div 1 1 332307692 0 0 axi_post_div 1 1 332307692 0 0 main_axi_root_clk 1 1 332307692 0 0 ocram_clk 0 0 332307692 0 0 Reference Doc: i.MX 7D Reference Manual - Chap 5, p 516 (https://www.nxp.com/docs/en/reference-manual/IMX7DRM.pdf) Fixes: 8f6d8094b215 ("ARM: imx: add imx7d clk tree support") Signed-off-by: Adriana Reus Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/imx/clk-imx7d.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/imx/clk-imx7d.c b/drivers/clk/imx/clk-imx7d.c index 2305699db467..0ac9b30c8b90 100644 --- a/drivers/clk/imx/clk-imx7d.c +++ b/drivers/clk/imx/clk-imx7d.c @@ -797,7 +797,7 @@ static void __init imx7d_clocks_init(struct device_node *ccm_node) clks[IMX7D_MAIN_AXI_ROOT_CLK] = imx_clk_gate4("main_axi_root_clk", "axi_post_div", base + 0x4040, 0); clks[IMX7D_DISP_AXI_ROOT_CLK] = imx_clk_gate4("disp_axi_root_clk", "disp_axi_post_div", base + 0x4050, 0); clks[IMX7D_ENET_AXI_ROOT_CLK] = imx_clk_gate4("enet_axi_root_clk", "enet_axi_post_div", base + 0x4060, 0); - clks[IMX7D_OCRAM_CLK] = imx_clk_gate4("ocram_clk", "axi_post_div", base + 0x4110, 0); + clks[IMX7D_OCRAM_CLK] = imx_clk_gate4("ocram_clk", "main_axi_root_clk", base + 0x4110, 0); clks[IMX7D_OCRAM_S_CLK] = imx_clk_gate4("ocram_s_clk", "ahb_root_clk", base + 0x4120, 0); clks[IMX7D_DRAM_ROOT_CLK] = imx_clk_gate4("dram_root_clk", "dram_post_div", base + 0x4130, 0); clks[IMX7D_DRAM_PHYM_ROOT_CLK] = imx_clk_gate4("dram_phym_root_clk", "dram_phym_cg", base + 0x4130, 0); From de0bbe07a49ce9bf5bc4fc1846bd453151e7a334 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=A9bastien=20Szymanski?= Date: Tue, 1 Aug 2017 12:40:07 +0200 Subject: [PATCH 0463/1013] clk: imx6: refine hdmi_isfr's parent to make HDMI work on i.MX6 SoCs w/o VPU MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit c68ee58d9ee7b856ac722f18f4f26579c8fbd2b4 ] On i.MX6 SoCs without VPU (in my case MCIMX6D4AVT10AC), the hdmi driver fails to probe: [ 2.540030] dwhdmi-imx 120000.hdmi: Unsupported HDMI controller (0000:00:00) [ 2.548199] imx-drm display-subsystem: failed to bind 120000.hdmi (ops dw_hdmi_imx_ops): -19 [ 2.557403] imx-drm display-subsystem: master bind failed: -19 That's because hdmi_isfr's parent, video_27m, is not correctly ungated. As explained in commit 5ccc248cc537 ("ARM: imx6q: clk: Add support for mipi_core_cfg clock as a shared clock gate"), video_27m is gated by CCM_CCGR3[CG8]. On i.MX6 SoCs with VPU, the hdmi is working thanks to the CCM_CMEOR[mod_en_ov_vpu] bit which makes the video_27m ungated whatever is in CCM_CCGR3[CG8]. The issue can be reproduced by setting CCMEOR[mod_en_ov_vpu] to 0. Make the HDMI work in every case by setting hdmi_isfr's parent to mipi_core_cfg. Signed-off-by: Sébastien Szymanski Reviewed-by: Fabio Estevam Signed-off-by: Stephen Boyd Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/imx/clk-imx6q.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/imx/clk-imx6q.c b/drivers/clk/imx/clk-imx6q.c index c07df719b8a3..8d518ad5dc13 100644 --- a/drivers/clk/imx/clk-imx6q.c +++ b/drivers/clk/imx/clk-imx6q.c @@ -761,7 +761,7 @@ static void __init imx6q_clocks_init(struct device_node *ccm_node) clk[IMX6QDL_CLK_GPU2D_CORE] = imx_clk_gate2("gpu2d_core", "gpu2d_core_podf", base + 0x6c, 24); clk[IMX6QDL_CLK_GPU3D_CORE] = imx_clk_gate2("gpu3d_core", "gpu3d_core_podf", base + 0x6c, 26); clk[IMX6QDL_CLK_HDMI_IAHB] = imx_clk_gate2("hdmi_iahb", "ahb", base + 0x70, 0); - clk[IMX6QDL_CLK_HDMI_ISFR] = imx_clk_gate2("hdmi_isfr", "video_27m", base + 0x70, 4); + clk[IMX6QDL_CLK_HDMI_ISFR] = imx_clk_gate2("hdmi_isfr", "mipi_core_cfg", base + 0x70, 4); clk[IMX6QDL_CLK_I2C1] = imx_clk_gate2("i2c1", "ipg_per", base + 0x70, 6); clk[IMX6QDL_CLK_I2C2] = imx_clk_gate2("i2c2", "ipg_per", base + 0x70, 8); clk[IMX6QDL_CLK_I2C3] = imx_clk_gate2("i2c3", "ipg_per", base + 0x70, 10); From d9e497c9424896581b18f0807652364180401c7b Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Wed, 1 Nov 2017 08:09:59 -0400 Subject: [PATCH 0464/1013] media: camss-vfe: always initialize reg at vfe_set_xbar_cfg() [ Upstream commit 9917fbcfa20ab987d6381fd0365665e5c1402d75 ] if output->wm_num is bigger than 2, the value for reg is not initialized, as warned by smatch: drivers/media/platform/qcom/camss-8x16/camss-vfe.c:633 vfe_set_xbar_cfg() error: uninitialized symbol 'reg'. drivers/media/platform/qcom/camss-8x16/camss-vfe.c:637 vfe_set_xbar_cfg() error: uninitialized symbol 'reg'. That shouldn't happen in practice, so add a logic that will break the loop if i > 1, fixing the warnings. Signed-off-by: Mauro Carvalho Chehab Acked-by: Todor Tomov Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/platform/qcom/camss-8x16/camss-vfe.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/media/platform/qcom/camss-8x16/camss-vfe.c b/drivers/media/platform/qcom/camss-8x16/camss-vfe.c index b22d2dfcd3c2..55232a912950 100644 --- a/drivers/media/platform/qcom/camss-8x16/camss-vfe.c +++ b/drivers/media/platform/qcom/camss-8x16/camss-vfe.c @@ -622,6 +622,9 @@ static void vfe_set_xbar_cfg(struct vfe_device *vfe, struct vfe_output *output, reg = VFE_0_BUS_XBAR_CFG_x_M_PAIR_STREAM_EN; if (p == V4L2_PIX_FMT_NV12 || p == V4L2_PIX_FMT_NV16) reg |= VFE_0_BUS_XBAR_CFG_x_M_PAIR_STREAM_SWAP_INTER_INTRA; + } else { + /* On current devices output->wm_num is always <= 2 */ + break; } if (output->wm_idx[i] % 2 == 1) From 6facfe25e62e702304591bd114bcf7d6067c86e4 Mon Sep 17 00:00:00 2001 From: Leo Yan Date: Fri, 1 Sep 2017 08:47:14 +0800 Subject: [PATCH 0465/1013] clk: hi6220: mark clock cs_atb_syspll as critical [ Upstream commit d2a3671ebe6479483a12f94fcca63c058d95ad64 ] Clock cs_atb_syspll is pll used for coresight trace bus; when clock cs_atb_syspll is disabled and operates its child clock node cs_atb results in system hang. So mark clock cs_atb_syspll as critical to keep it enabled. Cc: Guodong Xu Cc: Zhangfei Gao Cc: Haojian Zhuang Signed-off-by: Leo Yan Signed-off-by: Michael Turquette Link: lkml.kernel.org/r/1504226835-2115-2-git-send-email-leo.yan@linaro.org Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/hisilicon/clk-hi6220.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/hisilicon/clk-hi6220.c b/drivers/clk/hisilicon/clk-hi6220.c index e786d717f75d..a87809d4bd52 100644 --- a/drivers/clk/hisilicon/clk-hi6220.c +++ b/drivers/clk/hisilicon/clk-hi6220.c @@ -145,7 +145,7 @@ static struct hisi_gate_clock hi6220_separated_gate_clks_sys[] __initdata = { { HI6220_BBPPLL_SEL, "bbppll_sel", "pll0_bbp_gate", CLK_SET_RATE_PARENT|CLK_IGNORE_UNUSED, 0x270, 9, 0, }, { HI6220_MEDIA_PLL_SRC, "media_pll_src", "pll_media_gate", CLK_SET_RATE_PARENT|CLK_IGNORE_UNUSED, 0x270, 10, 0, }, { HI6220_MMC2_SEL, "mmc2_sel", "mmc2_mux1", CLK_SET_RATE_PARENT|CLK_IGNORE_UNUSED, 0x270, 11, 0, }, - { HI6220_CS_ATB_SYSPLL, "cs_atb_syspll", "syspll", CLK_SET_RATE_PARENT|CLK_IGNORE_UNUSED, 0x270, 12, 0, }, + { HI6220_CS_ATB_SYSPLL, "cs_atb_syspll", "syspll", CLK_SET_RATE_PARENT|CLK_IS_CRITICAL, 0x270, 12, 0, }, }; static struct hisi_mux_clock hi6220_mux_clks_sys[] __initdata = { From 85dcb3c850a79751b2239f434a4b6b36a75a865e Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Sat, 14 Oct 2017 17:22:25 +0800 Subject: [PATCH 0466/1013] blk-mq-sched: dispatch from scheduler IFF progress is made in ->dispatch [ Upstream commit 5e3d02bbafad38975099b5848f5ebadedcf7bb7e ] When the hw queue is busy, we shouldn't take requests from the scheduler queue any more, otherwise it is difficult to do IO merge. This patch fixes the awful IO performance on some SCSI devices(lpfc, qla2xxx, ...) when mq-deadline/kyber is used by not taking requests if hw queue is busy. Reviewed-by: Omar Sandoval Reviewed-by: Bart Van Assche Reviewed-by: Christoph Hellwig Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- block/blk-mq-sched.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c index 4ab69435708c..eca011fdfa0e 100644 --- a/block/blk-mq-sched.c +++ b/block/blk-mq-sched.c @@ -94,7 +94,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) struct request_queue *q = hctx->queue; struct elevator_queue *e = q->elevator; const bool has_sched_dispatch = e && e->type->ops.mq.dispatch_request; - bool did_work = false; + bool do_sched_dispatch = true; LIST_HEAD(rq_list); /* RCU or SRCU read lock is needed before checking quiesced flag */ @@ -125,18 +125,18 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx) */ if (!list_empty(&rq_list)) { blk_mq_sched_mark_restart_hctx(hctx); - did_work = blk_mq_dispatch_rq_list(q, &rq_list); + do_sched_dispatch = blk_mq_dispatch_rq_list(q, &rq_list); } else if (!has_sched_dispatch) { blk_mq_flush_busy_ctxs(hctx, &rq_list); blk_mq_dispatch_rq_list(q, &rq_list); } /* - * We want to dispatch from the scheduler if we had no work left - * on the dispatch list, OR if we did have work but weren't able - * to make progress. + * We want to dispatch from the scheduler if there was nothing + * on the dispatch list or we were able to dispatch from the + * dispatch list. */ - if (!did_work && has_sched_dispatch) { + if (do_sched_dispatch && has_sched_dispatch) { do { struct request *rq; From a23c8c70b5ece8d18c49927a0bace4471d8563cb Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Fri, 15 Sep 2017 12:10:13 -0700 Subject: [PATCH 0467/1013] clk: tegra: Use readl_relaxed_poll_timeout_atomic() in tegra210_clock_init() [ Upstream commit 22ef01a203d27fee8b7694020b7e722db7efd2a7 ] Below is the call trace of tegra210_init_pllu() function: start_kernel() -> time_init() --> of_clk_init() ---> tegra210_clock_init() ----> tegra210_pll_init() -----> tegra210_init_pllu() Because the preemption is disabled in the start_kernel before calling time_init, tegra210_init_pllu is actually in an atomic context while it includes a readl_relaxed_poll_timeout that might sleep. So this patch just changes this readl_relaxed_poll_timeout() to its atomic version. Signed-off-by: Nicolin Chen Acked-By: Peter De Schrijver Signed-off-by: Thierry Reding Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/tegra/clk-tegra210.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/clk/tegra/clk-tegra210.c b/drivers/clk/tegra/clk-tegra210.c index 6d7a613f2656..b92867814e2d 100644 --- a/drivers/clk/tegra/clk-tegra210.c +++ b/drivers/clk/tegra/clk-tegra210.c @@ -2566,8 +2566,8 @@ static int tegra210_enable_pllu(void) reg |= PLL_ENABLE; writel(reg, clk_base + PLLU_BASE); - readl_relaxed_poll_timeout(clk_base + PLLU_BASE, reg, - reg & PLL_BASE_LOCK, 2, 1000); + readl_relaxed_poll_timeout_atomic(clk_base + PLLU_BASE, reg, + reg & PLL_BASE_LOCK, 2, 1000); if (!(reg & PLL_BASE_LOCK)) { pr_err("Timed out waiting for PLL_U to lock\n"); return -ETIMEDOUT; From 3d213c4c0a9b0d8fb0958d4a461868525ee6d864 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Micha=C5=82=20Miros=C5=82aw?= Date: Tue, 19 Sep 2017 04:48:10 +0200 Subject: [PATCH 0468/1013] clk: tegra: Fix cclk_lp divisor register MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 54eff2264d3e9fd7e3987de1d7eba1d3581c631e ] According to comments in code and common sense, cclk_lp uses its own divisor, not cclk_g's. Fixes: b08e8c0ecc42 ("clk: tegra: add clock support for Tegra30") Signed-off-by: Michał Mirosław Acked-By: Peter De Schrijver Signed-off-by: Thierry Reding Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/tegra/clk-tegra30.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/tegra/clk-tegra30.c b/drivers/clk/tegra/clk-tegra30.c index a2d163f759b4..07f5203df01c 100644 --- a/drivers/clk/tegra/clk-tegra30.c +++ b/drivers/clk/tegra/clk-tegra30.c @@ -964,7 +964,7 @@ static void __init tegra30_super_clk_init(void) * U71 divider of cclk_lp. */ clk = tegra_clk_register_divider("pll_p_out3_cclklp", "pll_p_out3", - clk_base + SUPER_CCLKG_DIVIDER, 0, + clk_base + SUPER_CCLKLP_DIVIDER, 0, TEGRA_DIVIDER_INT, 16, 8, 1, NULL); clk_register_clkdev(clk, "pll_p_out3_cclklp", NULL); From f7ee900a4f2c76902ce368e93d34b7580cc7ff04 Mon Sep 17 00:00:00 2001 From: Gao Feng Date: Tue, 31 Oct 2017 18:25:37 +0800 Subject: [PATCH 0469/1013] ppp: Destroy the mutex when cleanup [ Upstream commit f02b2320b27c16b644691267ee3b5c110846f49e ] The mutex_destroy only makes sense when enable DEBUG_MUTEX. For the good readbility, it's better to invoke it in exit func when the init func invokes mutex_init. Signed-off-by: Gao Feng Acked-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ppp/ppp_generic.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c index e365866600ba..bf14c51f35e1 100644 --- a/drivers/net/ppp/ppp_generic.c +++ b/drivers/net/ppp/ppp_generic.c @@ -959,6 +959,7 @@ static __net_exit void ppp_exit_net(struct net *net) unregister_netdevice_many(&list); rtnl_unlock(); + mutex_destroy(&pn->all_ppp_mutex); idr_destroy(&pn->units_idr); } From 556787a174e6f77c7aec167af846f1297e3860f8 Mon Sep 17 00:00:00 2001 From: Kuninori Morimoto Date: Wed, 1 Nov 2017 07:16:58 +0000 Subject: [PATCH 0470/1013] ASoC: rsnd: rsnd_ssi_run_mods() needs to care ssi_parent_mod [ Upstream commit 21781e87881f9c420871b1d1f3f29d4cd7bffb10 ] SSI parent mod might be NULL. ssi_parent_mod() needs to care about it. Otherwise, it uses negative shift. This patch fixes it. Signed-off-by: Kuninori Morimoto Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/soc/sh/rcar/ssi.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/sound/soc/sh/rcar/ssi.c b/sound/soc/sh/rcar/ssi.c index fffc07e72627..2aef7c00cca1 100644 --- a/sound/soc/sh/rcar/ssi.c +++ b/sound/soc/sh/rcar/ssi.c @@ -198,10 +198,15 @@ static u32 rsnd_ssi_run_mods(struct rsnd_dai_stream *io) { struct rsnd_mod *ssi_mod = rsnd_io_to_mod_ssi(io); struct rsnd_mod *ssi_parent_mod = rsnd_io_to_mod_ssip(io); + u32 mods; - return rsnd_ssi_multi_slaves_runtime(io) | - 1 << rsnd_mod_id(ssi_mod) | - 1 << rsnd_mod_id(ssi_parent_mod); + mods = rsnd_ssi_multi_slaves_runtime(io) | + 1 << rsnd_mod_id(ssi_mod); + + if (ssi_parent_mod) + mods |= 1 << rsnd_mod_id(ssi_parent_mod); + + return mods; } u32 rsnd_ssi_multi_slaves_runtime(struct rsnd_dai_stream *io) From 5642562d0b4f616cac2b4d7f81de0a58f2247b59 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Thu, 19 Oct 2017 19:05:58 +0200 Subject: [PATCH 0471/1013] thermal/drivers/step_wise: Fix temperature regulation misbehavior MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 07209fcf33542c1ff1e29df2dbdf8f29cdaacb10 ] There is a particular situation when the cooling device is cpufreq and the heat dissipation is not efficient enough where the temperature increases little by little until reaching the critical threshold and leading to a SoC reset. The behavior is reproducible on a hikey6220 with bad heat dissipation (eg. stacked with other boards). Running a simple C program doing while(1); for each CPU of the SoC makes the temperature to reach the passive regulation trip point and ends up to the maximum allowed temperature followed by a reset. This issue has been also reported by running the libhugetlbfs test suite. What is observed is a ping pong between two cpu frequencies, 1.2GHz and 900MHz while the temperature continues to grow. It appears the step wise governor calls get_target_state() the first time with the throttle set to true and the trend to 'raising'. The code selects logically the next state, so the cpu frequency decreases from 1.2GHz to 900MHz, so far so good. The temperature decreases immediately but still stays greater than the trip point, then get_target_state() is called again, this time with the throttle set to true *and* the trend to 'dropping'. From there the algorithm assumes we have to step down the state and the cpu frequency jumps back to 1.2GHz. But the temperature is still higher than the trip point, so get_target_state() is called with throttle=1 and trend='raising' again, we jump to 900MHz, then get_target_state() is called with throttle=1 and trend='dropping', we jump to 1.2GHz, etc ... but the temperature does not stabilizes and continues to increase. [ 237.922654] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 237.922678] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 237.922690] thermal cooling_device0: cur_state=0 [ 237.922701] thermal cooling_device0: old_target=0, target=1 [ 238.026656] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 238.026680] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1 [ 238.026694] thermal cooling_device0: cur_state=1 [ 238.026707] thermal cooling_device0: old_target=1, target=0 [ 238.134647] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 238.134667] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 238.134679] thermal cooling_device0: cur_state=0 [ 238.134690] thermal cooling_device0: old_target=0, target=1 In this situation the temperature continues to increase while the trend is oscillating between 'dropping' and 'raising'. We need to keep the current state untouched if the throttle is set, so the temperature can decrease or a higher state could be selected, thus preventing this oscillation. Keeping the next_target untouched when 'throttle' is true at 'dropping' time fixes the issue. The following traces show the governor does not change the next state if trend==2 (dropping) and throttle==1. [ 2306.127987] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2306.128009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 2306.128021] thermal cooling_device0: cur_state=0 [ 2306.128031] thermal cooling_device0: old_target=0, target=1 [ 2306.231991] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2306.232016] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=1 [ 2306.232030] thermal cooling_device0: cur_state=1 [ 2306.232042] thermal cooling_device0: old_target=1, target=1 [ 2306.335982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2306.336006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=1 [ 2306.336021] thermal cooling_device0: cur_state=1 [ 2306.336034] thermal cooling_device0: old_target=1, target=1 [ 2306.439984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2306.440008] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0 [ 2306.440022] thermal cooling_device0: cur_state=1 [ 2306.440034] thermal cooling_device0: old_target=1, target=0 [ ... ] After a while, if the temperature continues to increase, the next state becomes 2 which is 720MHz on the hikey. That results in the temperature stabilizing around the trip point. [ 2455.831982] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2455.832006] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=0 [ 2455.832019] thermal cooling_device0: cur_state=1 [ 2455.832032] thermal cooling_device0: old_target=1, target=1 [ 2455.935985] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2455.936013] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0 [ 2455.936027] thermal cooling_device0: cur_state=1 [ 2455.936040] thermal cooling_device0: old_target=1, target=1 [ 2456.043984] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=0,throttle=1 [ 2456.044009] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=0,throttle=0 [ 2456.044023] thermal cooling_device0: cur_state=1 [ 2456.044036] thermal cooling_device0: old_target=1, target=1 [ 2456.148001] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=1,throttle=1 [ 2456.148028] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=1,throttle=1 [ 2456.148042] thermal cooling_device0: cur_state=1 [ 2456.148055] thermal cooling_device0: old_target=1, target=2 [ 2456.252009] thermal thermal_zone0: Trip0[type=1,temp=65000]:trend=2,throttle=1 [ 2456.252041] thermal thermal_zone0: Trip1[type=1,temp=75000]:trend=2,throttle=0 [ 2456.252058] thermal cooling_device0: cur_state=2 [ 2456.252075] thermal cooling_device0: old_target=2, target=1 IOW, this change is needed to keep the state for a cooling device if the temperature trend is oscillating while the temperature increases slightly. Without this change, the situation above leads to a catastrophic crash by a hardware reset on hikey. This issue has been reported to happen on an OMAP dra7xx also. Signed-off-by: Daniel Lezcano Cc: Keerthy Cc: John Stultz Cc: Leo Yan Tested-by: Keerthy Reviewed-by: Keerthy Signed-off-by: Eduardo Valentin Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/thermal/step_wise.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/thermal/step_wise.c b/drivers/thermal/step_wise.c index be95826631b7..ee047ca43084 100644 --- a/drivers/thermal/step_wise.c +++ b/drivers/thermal/step_wise.c @@ -31,8 +31,7 @@ * If the temperature is higher than a trip point, * a. if the trend is THERMAL_TREND_RAISING, use higher cooling * state for this trip point - * b. if the trend is THERMAL_TREND_DROPPING, use lower cooling - * state for this trip point + * b. if the trend is THERMAL_TREND_DROPPING, do nothing * c. if the trend is THERMAL_TREND_RAISE_FULL, use upper limit * for this trip point * d. if the trend is THERMAL_TREND_DROP_FULL, use lower limit @@ -94,9 +93,11 @@ static unsigned long get_target_state(struct thermal_instance *instance, if (!throttle) next_target = THERMAL_NO_TARGET; } else { - next_target = cur_state - 1; - if (next_target > instance->upper) - next_target = instance->upper; + if (!throttle) { + next_target = cur_state - 1; + if (next_target > instance->upper) + next_target = instance->upper; + } } break; case THERMAL_TREND_DROP_FULL: From 85d63b76bdd1b570b9c16e60f83eb3185e22b30d Mon Sep 17 00:00:00 2001 From: Kishon Vijay Abraham I Date: Wed, 11 Oct 2017 14:14:36 +0530 Subject: [PATCH 0472/1013] misc: pci_endpoint_test: Fix failure path return values in probe [ Upstream commit 80068c93688f6143100859c4856f895801c1a1d9 ] Return value of pci_endpoint_test_probe is not set properly in a couple of failure cases. Fix it here. Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/misc/pci_endpoint_test.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c index deb203026496..e787a63a321a 100644 --- a/drivers/misc/pci_endpoint_test.c +++ b/drivers/misc/pci_endpoint_test.c @@ -533,6 +533,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, test->base = test->bar[test_reg_bar]; if (!test->base) { + err = -ENOMEM; dev_err(dev, "Cannot perform PCI test without BAR%d\n", test_reg_bar); goto err_iounmap; @@ -542,6 +543,7 @@ static int pci_endpoint_test_probe(struct pci_dev *pdev, id = ida_simple_get(&pci_endpoint_test_ida, 0, 0, GFP_KERNEL); if (id < 0) { + err = id; dev_err(dev, "unable to get id\n"); goto err_iounmap; } From 95e8d653ea987adfe108173bd5969201a363bf19 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Sat, 30 Sep 2017 11:16:51 +0300 Subject: [PATCH 0473/1013] misc: pci_endpoint_test: Avoid triggering a BUG() [ Upstream commit 846df244ebefbc9f7b91e9ae7a5e5a2e69fb4772 ] If you call ida_simple_remove(&pci_endpoint_test_ida, id) with a negative "id" then it triggers an immediate BUG_ON(). Let's not allow that. Fixes: 2c156ac71c6b ("misc: Add host side PCI driver for PCI test function device") Signed-off-by: Dan Carpenter Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/misc/pci_endpoint_test.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c index e787a63a321a..e089bb6dde3a 100644 --- a/drivers/misc/pci_endpoint_test.c +++ b/drivers/misc/pci_endpoint_test.c @@ -590,6 +590,8 @@ static void pci_endpoint_test_remove(struct pci_dev *pdev) if (sscanf(misc_device->name, DRV_MODULE_NAME ".%d", &id) != 1) return; + if (id < 0) + return; misc_deregister(&test->miscdev); ida_simple_remove(&pci_endpoint_test_ida, id); From aa902be00448f3bc69fd54b1a0a8f5b4b6a47c5b Mon Sep 17 00:00:00 2001 From: Douglas Gilbert Date: Sun, 29 Oct 2017 10:47:19 -0400 Subject: [PATCH 0474/1013] scsi: scsi_debug: write_same: fix error report [ Upstream commit e33d7c56450b0a5c7290cbf9e1581fab5174f552 ] The scsi_debug driver incorrectly suggests there is an error with the SCSI WRITE SAME command when the number_of_logical_blocks is greater than 1. It will also suggest there is an error when NDOB (no data-out buffer) is set and the number_of_logical_blocks is greater than 0. Both are valid, fix. Signed-off-by: Douglas Gilbert Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/scsi_debug.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c index 09ba494f8896..92bc5b2d24ae 100644 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@ -3001,11 +3001,11 @@ static int resp_write_same(struct scsi_cmnd *scp, u64 lba, u32 num, if (-1 == ret) { write_unlock_irqrestore(&atomic_rw, iflags); return DID_ERROR << 16; - } else if (sdebug_verbose && (ret < (num * sdebug_sector_size))) + } else if (sdebug_verbose && !ndob && (ret < sdebug_sector_size)) sdev_printk(KERN_INFO, scp->device, - "%s: %s: cdb indicated=%u, IO sent=%d bytes\n", + "%s: %s: lb size=%u, IO sent=%d bytes\n", my_name, "write same", - num * sdebug_sector_size, ret); + sdebug_sector_size, ret); /* Copy first sector to remaining blocks */ for (i = 1 ; i < num ; i++) From 1678bb970113e62469ca510bff20eb95bbdeaec2 Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Wed, 20 Sep 2017 08:30:04 -0500 Subject: [PATCH 0475/1013] GFS2: Take inode off order_write list when setting jdata flag [ Upstream commit cc555b09d8c3817aeebda43a14ab67049a5653f7 ] This patch fixes a deadlock caused when the jdata flag is set for inodes that are already on the ordered write list. Since it is on the ordered write list, log_flush calls gfs2_ordered_write which calls filemap_fdatawrite. But since the inode had the jdata flag set, that calls gfs2_jdata_writepages, which tries to start a new transaction. A new transaction cannot be started because it tries to acquire the log_flush rwsem which is already locked by the log flush operation. The bottom line is: We cannot switch an inode from ordered to jdata until we eliminate any ordered data pages (via log flush) or any log_flush operation afterward will create the circular dependency above. So we need to flush the log before setting the diskflags to switch the file mode, then we need to remove the inode from the ordered writes list. Before this patch, the log flush was done for jdata->ordered, but that's wrong. If we're going from jdata to ordered, we don't need to call gfs2_log_flush because the call to filemap_fdatawrite will do it for us: filemap_fdatawrite() -> __filemap_fdatawrite_range() __filemap_fdatawrite_range() -> do_writepages() do_writepages() -> gfs2_jdata_writepages() gfs2_jdata_writepages() -> gfs2_log_flush() This patch modifies function do_gfs2_set_flags so that if a file has its jdata flag set, and it's already on the ordered write list, the log will be flushed and it will be removed from the list before setting the flag. Signed-off-by: Bob Peterson Signed-off-by: Andreas Gruenbacher Acked-by: Abhijith Das Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/gfs2/file.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c index 33a0cb5701a3..2a29cf3371f6 100644 --- a/fs/gfs2/file.c +++ b/fs/gfs2/file.c @@ -256,7 +256,7 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask) goto out; } if ((flags ^ new_flags) & GFS2_DIF_JDATA) { - if (flags & GFS2_DIF_JDATA) + if (new_flags & GFS2_DIF_JDATA) gfs2_log_flush(sdp, ip->i_gl, NORMAL_FLUSH); error = filemap_fdatawrite(inode->i_mapping); if (error) @@ -264,6 +264,8 @@ static int do_gfs2_set_flags(struct file *filp, u32 reqflags, u32 mask) error = filemap_fdatawait(inode->i_mapping); if (error) goto out; + if (new_flags & GFS2_DIF_JDATA) + gfs2_ordered_del_inode(ip); } error = gfs2_trans_begin(sdp, RES_DINODE, 0); if (error) From 7291d99ebc67be887e5c915489706e6fe720992a Mon Sep 17 00:00:00 2001 From: Adam Sampson Date: Tue, 24 Oct 2017 16:14:46 -0400 Subject: [PATCH 0476/1013] media: usbtv: fix brightness and contrast controls [ Upstream commit b3168c87c0492661badc3e908f977d79e7738a41 ] Because the brightness and contrast controls share a register, usbtv_s_ctrl needs to read the existing values for both controls before inserting the new value. However, the code accidentally wrote to the registers (from an uninitialised stack array), rather than reading them. The user-visible effect of this was that adjusting the brightness would also set the contrast to a random value, and vice versa -- so it wasn't possible to correctly adjust the brightness of usbtv's video output. Tested with an "EasyDAY" UTV007 device. Fixes: c53a846c48f2 ("usbtv: add video controls") Signed-off-by: Adam Sampson Reviewed-by: Lubomir Rintel Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/media/usb/usbtv/usbtv-video.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/media/usb/usbtv/usbtv-video.c b/drivers/media/usb/usbtv/usbtv-video.c index 95b5f4319ec2..3668a04359e8 100644 --- a/drivers/media/usb/usbtv/usbtv-video.c +++ b/drivers/media/usb/usbtv/usbtv-video.c @@ -718,8 +718,8 @@ static int usbtv_s_ctrl(struct v4l2_ctrl *ctrl) */ if (ctrl->id == V4L2_CID_BRIGHTNESS || ctrl->id == V4L2_CID_CONTRAST) { ret = usb_control_msg(usbtv->udev, - usb_sndctrlpipe(usbtv->udev, 0), USBTV_CONTROL_REG, - USB_DIR_OUT | USB_TYPE_VENDOR | USB_RECIP_DEVICE, + usb_rcvctrlpipe(usbtv->udev, 0), USBTV_CONTROL_REG, + USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_DEVICE, 0, USBTV_BASE + 0x0244, (void *)data, 3, 0); if (ret < 0) goto error; From 81309f8b9b69913233733ed61c4c87dab5436902 Mon Sep 17 00:00:00 2001 From: Arun Kumar Neelakantam Date: Mon, 30 Oct 2017 11:11:24 +0530 Subject: [PATCH 0477/1013] rpmsg: glink: Initialize the "intent_req_comp" completion variable [ Upstream commit 2394facb17bcace4b3c19b50202177a5d8903b64 ] The "intent_req_comp" variable is used without initialization which results in NULL pointer dereference in qcom_glink_request_intent(). we need to initialize the completion variable before using it. Fixes: 27b9c5b66b23 ("rpmsg: glink: Request for intents when unavailable") Signed-off-by: Arun Kumar Neelakantam Signed-off-by: Bjorn Andersson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/rpmsg/qcom_glink_native.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/rpmsg/qcom_glink_native.c b/drivers/rpmsg/qcom_glink_native.c index e3242a0925a6..e8e12c2b1d0e 100644 --- a/drivers/rpmsg/qcom_glink_native.c +++ b/drivers/rpmsg/qcom_glink_native.c @@ -227,6 +227,7 @@ static struct glink_channel *qcom_glink_alloc_channel(struct qcom_glink *glink, init_completion(&channel->open_req); init_completion(&channel->open_ack); + init_completion(&channel->intent_req_comp); INIT_LIST_HEAD(&channel->done_intents); INIT_WORK(&channel->intent_work, qcom_glink_rx_done_work); From c9973b3d35267028edf8b9a4df6310161bd0ef37 Mon Sep 17 00:00:00 2001 From: Liang Chen Date: Mon, 30 Oct 2017 14:46:35 -0700 Subject: [PATCH 0478/1013] bcache: explicitly destroy mutex while exiting [ Upstream commit 330a4db89d39a6b43f36da16824eaa7a7509d34d ] mutex_destroy does nothing most of time, but it's better to call it to make the code future proof and it also has some meaning for like mutex debug. As Coly pointed out in a previous review, bcache_exit() may not be able to handle all the references properly if userspace registers cache and backing devices right before bch_debug_init runs and bch_debug_init failes later. So not exposing userspace interface until everything is ready to avoid that issue. Signed-off-by: Liang Chen Reviewed-by: Michael Lyle Reviewed-by: Coly Li Reviewed-by: Eric Wheeler Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/bcache/super.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c index fc0a31b13ac4..25bf003fb198 100644 --- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -2085,6 +2085,7 @@ static void bcache_exit(void) if (bcache_major) unregister_blkdev(bcache_major, "bcache"); unregister_reboot_notifier(&reboot); + mutex_destroy(&bch_register_lock); } static int __init bcache_init(void) @@ -2103,14 +2104,15 @@ static int __init bcache_init(void) bcache_major = register_blkdev(0, "bcache"); if (bcache_major < 0) { unregister_reboot_notifier(&reboot); + mutex_destroy(&bch_register_lock); return bcache_major; } if (!(bcache_wq = alloc_workqueue("bcache", WQ_MEM_RECLAIM, 0)) || !(bcache_kobj = kobject_create_and_add("bcache", fs_kobj)) || - sysfs_create_files(bcache_kobj, files) || bch_request_init() || - bch_debug_init(bcache_kobj)) + bch_debug_init(bcache_kobj) || + sysfs_create_files(bcache_kobj, files)) goto err; return 0; From d5860344c243308c0d5e9751553d52d957b29e5b Mon Sep 17 00:00:00 2001 From: "tang.junhui" Date: Mon, 30 Oct 2017 14:46:34 -0700 Subject: [PATCH 0479/1013] bcache: fix wrong cache_misses statistics [ Upstream commit c157313791a999646901b3e3c6888514ebc36d62 ] Currently, Cache missed IOs are identified by s->cache_miss, but actually, there are many situations that missed IOs are not assigned a value for s->cache_miss in cached_dev_cache_miss(), for example, a bypassed IO (s->iop.bypass = 1), or the cache_bio allocate failed. In these situations, it will go to out_put or out_submit, and s->cache_miss is null, which leads bch_mark_cache_accounting() to treat this IO as a hit IO. [ML: applied by 3-way merge] Signed-off-by: tang.junhui Reviewed-by: Michael Lyle Reviewed-by: Coly Li Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/bcache/request.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index 14d13cab5cda..e9fbf2bcd122 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -463,6 +463,7 @@ struct search { unsigned recoverable:1; unsigned write:1; unsigned read_dirty_data:1; + unsigned cache_missed:1; unsigned long start_time; @@ -649,6 +650,7 @@ static inline struct search *search_alloc(struct bio *bio, s->orig_bio = bio; s->cache_miss = NULL; + s->cache_missed = 0; s->d = d; s->recoverable = 1; s->write = op_is_write(bio_op(bio)); @@ -767,7 +769,7 @@ static void cached_dev_read_done_bh(struct closure *cl) struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); bch_mark_cache_accounting(s->iop.c, s->d, - !s->cache_miss, s->iop.bypass); + !s->cache_missed, s->iop.bypass); trace_bcache_read(s->orig_bio, !s->cache_miss, s->iop.bypass); if (s->iop.status) @@ -786,6 +788,8 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s, struct cached_dev *dc = container_of(s->d, struct cached_dev, disk); struct bio *miss, *cache_bio; + s->cache_missed = 1; + if (s->cache_miss || s->iop.bypass) { miss = bio_next_split(bio, sectors, GFP_NOIO, s->d->bio_split); ret = miss == bio ? MAP_DONE : MAP_CONTINUE; From 7d1beb462e18b55fa5d0990d0f60fbf4edfda61b Mon Sep 17 00:00:00 2001 From: Patel Jay P Date: Mon, 23 Oct 2017 06:05:53 -0700 Subject: [PATCH 0480/1013] Ib/hfi1: Return actual operational VLs in port info query [ Upstream commit 00f9203119dd2774564407c7a67b17d81916298b ] __subn_get_opa_portinfo stores value returned by hfi1_get_ib_cfg() as operational vls. hfi1_get_ib_cfg() returns vls_operational field in hfi1_pportdata. The problem with this is that the value is always equal to vls_supported field in hfi1_pportdata. The logic to calculate operational_vls is to set value passed by FM (in __subn_set_opa_portinfo routine). If no value is passed then default value is stored in operational_vls. Field actual_vls_operational is calculated on the basis of buffer control table. Hence, modifying hfi1_get_ib_cfg() to return actual_operational_vls when used with HFI1_IB_CFG_OP_VLS parameter Reviewed-by: Mike Marciniszyn Reviewed-by: Dennis Dalessandro Signed-off-by: Patel Jay P Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/chip.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/hfi1/chip.c b/drivers/infiniband/hw/hfi1/chip.c index 312444386f54..0e17d03ef1cb 100644 --- a/drivers/infiniband/hw/hfi1/chip.c +++ b/drivers/infiniband/hw/hfi1/chip.c @@ -9952,7 +9952,7 @@ int hfi1_get_ib_cfg(struct hfi1_pportdata *ppd, int which) goto unimplemented; case HFI1_IB_CFG_OP_VLS: - val = ppd->vls_operational; + val = ppd->actual_vls_operational; break; case HFI1_IB_CFG_VL_HIGH_CAP: /* VL arb high priority table size */ val = VL_ARB_HIGH_PRIO_TABLE_SIZE; From e508e6026b198b9879e0d1850fdef8ef1d4d0805 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ronald=20Tschal=C3=A4r?= Date: Wed, 25 Oct 2017 22:15:19 -0700 Subject: [PATCH 0481/1013] Bluetooth: hci_ldisc: Fix another race when closing the tty. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 0338b1b393ec7910898e8f7b25b3bf31a7282e16 ] The following race condition still existed: P1 P2 cancel_work_sync() hci_uart_tx_wakeup() hci_uart_write_work() hci_uart_dequeue() clear_bit(HCI_UART_PROTO_READY) hci_unregister_dev(hdev) hci_free_dev(hdev) hu->proto->close(hu) kfree(hu) access to hdev and hu Cancelling the work after clearing the HCI_UART_PROTO_READY bit avoids this as any hci_uart_tx_wakeup() issued after the flag is cleared will detect that and not schedule further work. Signed-off-by: Ronald Tschalär Reviewed-by: Lukas Wunner Signed-off-by: Marcel Holtmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/bluetooth/hci_ldisc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c index a746627e784e..6e2403805784 100644 --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -510,13 +510,13 @@ static void hci_uart_tty_close(struct tty_struct *tty) if (hdev) hci_uart_close(hdev); - cancel_work_sync(&hu->write_work); - if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) { write_lock_irqsave(&hu->proto_lock, flags); clear_bit(HCI_UART_PROTO_READY, &hu->flags); write_unlock_irqrestore(&hu->proto_lock, flags); + cancel_work_sync(&hu->write_work); + if (hdev) { if (test_bit(HCI_UART_REGISTERED, &hu->flags)) hci_unregister_dev(hdev); From 7bd6bf08dd5b30dbe0696a7387dedc65a5f9a3e6 Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Fri, 27 Oct 2017 09:33:41 -0700 Subject: [PATCH 0482/1013] arm64: prevent regressions in compressed kernel image size when upgrading to binutils 2.27 [ Upstream commit fd9dde6abcb9bfe6c6bee48834e157999f113971 ] Upon upgrading to binutils 2.27, we found that our lz4 and gzip compressed kernel images were significantly larger, resulting is 10ms boot time regressions. As noted by Rahul: "aarch64 binaries uses RELA relocations, where each relocation entry includes an addend value. This is similar to x86_64. On x86_64, the addend values are also stored at the relocation offset for relative relocations. This is an optimization: in the case where code does not need to be relocated, the loader can simply skip processing relative relocations. In binutils-2.25, both bfd and gold linkers did this for x86_64, but only the gold linker did this for aarch64. The kernel build here is using the bfd linker, which stored zeroes at the relocation offsets for relative relocations. Since a set of zeroes compresses better than a set of non-zero addend values, this behavior was resulting in much better lz4 compression. The bfd linker in binutils-2.27 is now storing the actual addend values at the relocation offsets. The behavior is now consistent with what it does for x86_64 and what gold linker does for both architectures. The change happened in this upstream commit: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1f56df9d0d5ad89806c24e71f296576d82344613 Since a bunch of zeroes got replaced by non-zero addend values, we see the side effect of lz4 compressed image being a bit bigger. To get the old behavior from the bfd linker, "--no-apply-dynamic-relocs" flag can be used: $ LDFLAGS="--no-apply-dynamic-relocs" make With this flag, the compressed image size is back to what it was with binutils-2.25. If the kernel is using ASLR, there aren't additional runtime costs to --no-apply-dynamic-relocs, as the relocations will need to be applied again anyway after the kernel is relocated to a random address. If the kernel is not using ASLR, then presumably the current default behavior of the linker is better. Since the static linker performed the dynamic relocs, and the kernel is not moved to a different address at load time, it can skip applying the relocations all over again." Some measurements: $ ld -v GNU ld (binutils-2.25-f3d35cf6) 2.25.51.20141117 ^ $ ls -l vmlinux -rwxr-x--- 1 ndesaulniers eng 300652760 Oct 26 11:57 vmlinux $ ls -l Image.lz4-dtb -rw-r----- 1 ndesaulniers eng 16932627 Oct 26 11:57 Image.lz4-dtb $ ld -v GNU ld (binutils-2.27-53dd00a1) 2.27.0.20170315 ^ pre patch: $ ls -l vmlinux -rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 11:43 vmlinux $ ls -l Image.lz4-dtb -rw-r----- 1 ndesaulniers eng 18159474 Oct 26 11:43 Image.lz4-dtb post patch: $ ls -l vmlinux -rwxr-x--- 1 ndesaulniers eng 300376208 Oct 26 12:06 vmlinux $ ls -l Image.lz4-dtb -rw-r----- 1 ndesaulniers eng 16932466 Oct 26 12:06 Image.lz4-dtb By Siqi's measurement w/ gzip: binutils 2.27 with this patch (with --no-apply-dynamic-relocs): Image 41535488 Image.gz 13404067 binutils 2.27 without this patch (without --no-apply-dynamic-relocs): Image 41535488 Image.gz 14125516 Any compression scheme should be able to get better results from the longer runs of zeros, not just GZIP and LZ4. 10ms boot time savings isn't anything to get excited about, but users of arm64+compression+bfd-2.27 should not have to pay a penalty for no runtime improvement. Reported-by: Gopinath Elanchezhian Reported-by: Sindhuri Pentyala Reported-by: Wei Wang Suggested-by: Ard Biesheuvel Suggested-by: Rahul Chaudhry Suggested-by: Siqi Lin Suggested-by: Stephen Hines Signed-off-by: Nick Desaulniers Reviewed-by: Ard Biesheuvel [will: added comment to Makefile] Signed-off-by: Will Deacon Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm64/Makefile | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 3eb4397150df..7318165cfc90 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -14,8 +14,12 @@ LDFLAGS_vmlinux :=-p --no-undefined -X CPPFLAGS_vmlinux.lds = -DTEXT_OFFSET=$(TEXT_OFFSET) GZFLAGS :=-9 -ifneq ($(CONFIG_RELOCATABLE),) -LDFLAGS_vmlinux += -pie -shared -Bsymbolic +ifeq ($(CONFIG_RELOCATABLE), y) +# Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour +# for relative relocs, since this leads to better Image compression +# with the relocation offsets always being zero. +LDFLAGS_vmlinux += -pie -shared -Bsymbolic \ + $(call ld-option, --no-apply-dynamic-relocs) endif ifeq ($(CONFIG_ARM64_ERRATUM_843419),y) From 4bcbfac98d517eae6d12257aed923347be16930c Mon Sep 17 00:00:00 2001 From: Anand Jain Date: Sat, 14 Oct 2017 08:34:02 +0800 Subject: [PATCH 0483/1013] btrfs: fix false EIO for missing device MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 102ed2c5ff932439bbbe74c7bd63e6d5baa9f732 ] When one of the device is missing, bbio_error() takes care of setting the error status. And if its only IO that is pending in that stripe, it fails to check the status of the other IO at %bbio_error before setting the error %bi_status for the %orig_bio. Fix this by checking if %bbio->error has exceeded the %bbio->max_errors. Reproducer as below fdatasync error is seen intermittently. mount -o degraded /dev/sdc /btrfs dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error The reason for the intermittences of the problem is because the following conditions have to be met, which depends on timing: In btrfs_map_bio() - the RAID1 the missing device has to be at %dev_nr = 1 In bbio_error() . before bbio_error() is called the bio of the not-missing device at %dev_nr = 0 must be completed so that the below condition is true if (atomic_dec_and_test(&bbio->stripes_pending)) { Signed-off-by: Anand Jain Reviewed-by: Liu Bo Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/volumes.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index b39737568c22..abc57f7b2716 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -6144,7 +6144,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical) btrfs_io_bio(bio)->mirror_num = bbio->mirror_num; bio->bi_iter.bi_sector = logical >> 9; - bio->bi_status = BLK_STS_IOERR; + if (atomic_read(&bbio->error) > bbio->max_errors) + bio->bi_status = BLK_STS_IOERR; + else + bio->bi_status = BLK_STS_OK; btrfs_end_bbio(bbio, bio); } } From 9e87c49d62f4c5b9ae4a56b8f62865c4712c8d4d Mon Sep 17 00:00:00 2001 From: Nikolay Borisov Date: Thu, 28 Sep 2017 10:53:17 +0300 Subject: [PATCH 0484/1013] btrfs: Explicitly handle btrfs_update_root failure [ Upstream commit 9417ebc8a676487c6ec8825f92fb28f7dbeb5f4b ] btrfs_udpate_root can fail and it aborts the transaction, the correct way to handle an aborted transaction is to explicitly end with btrfs_end_transaction. Even now the code is correct since btrfs_commit_transaction would handle an aborted transaction but this is more of an implementation detail. So let's be explicit in handling failure in btrfs_update_root. Furthermore btrfs_commit_transaction can also fail and by ignoring it's return value we could have left the in-memory copy of the root item in an inconsistent state. So capture the error value which allows us to correctly revert the RO/RW flags in case of commit failure. Signed-off-by: Nikolay Borisov Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/ioctl.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 6c7a49faf4e0..1f1338d52303 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -1842,8 +1842,13 @@ static noinline int btrfs_ioctl_subvol_setflags(struct file *file, ret = btrfs_update_root(trans, fs_info->tree_root, &root->root_key, &root->root_item); + if (ret < 0) { + btrfs_end_transaction(trans); + goto out_reset; + } + + ret = btrfs_commit_transaction(trans); - btrfs_commit_transaction(trans); out_reset: if (ret) btrfs_set_root_flags(&root->root_item, root_flags); From da76a65a011490a0b0bd9f50d872f26e629d1eed Mon Sep 17 00:00:00 2001 From: Anand Jain Date: Thu, 28 Sep 2017 14:51:09 +0800 Subject: [PATCH 0485/1013] btrfs: undo writable superblocke when sprouting fails [ Upstream commit 0af2c4bf5a012a40a2f9230458087d7f068339d0 ] When new device is being added to seed FS, seed FS is marked writable, but when we fail to bring in the new device, we missed to undo the writable part. This patch fixes it. Signed-off-by: Anand Jain Reviewed-by: Nikolay Borisov Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/volumes.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index abc57f7b2716..0c11121a8ace 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2501,6 +2501,8 @@ int btrfs_init_new_device(struct btrfs_fs_info *fs_info, const char *device_path return ret; error_trans: + if (seeding_dev) + sb->s_flags |= MS_RDONLY; btrfs_end_transaction(trans); rcu_string_free(device->name); btrfs_sysfs_rm_device_link(fs_info->fs_devices, device); From 864a5fb1c6066ade93a640a279bcd191952e4181 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Mon, 11 Sep 2017 16:15:28 +0100 Subject: [PATCH 0486/1013] btrfs: avoid null pointer dereference on fs_info when calling btrfs_crit [ Upstream commit 3993b112dac968612b0b213ed59cb30f50b0015b ] There are checks on fs_info in __btrfs_panic to avoid dereferencing a null fs_info, however, there is a call to btrfs_crit that may also dereference a null fs_info. Fix this by adding a check to see if fs_info is null and only print the s_id if fs_info is non-null. Detected by CoverityScan CID#401973 ("Dereference after null check") Fixes: efe120a067c8 ("Btrfs: convert printk to btrfs_ and fix BTRFS prefix") Signed-off-by: Colin Ian King Reviewed-by: David Sterba Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 161694b66038..e8f5e24325f3 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -202,7 +202,6 @@ static struct ratelimit_state printk_limits[] = { void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) { - struct super_block *sb = fs_info->sb; char lvl[PRINTK_MAX_SINGLE_HEADER_LEN + 1] = "\0"; struct va_format vaf; va_list args; @@ -228,7 +227,8 @@ void btrfs_printk(const struct btrfs_fs_info *fs_info, const char *fmt, ...) vaf.va = &args; if (__ratelimit(ratelimit)) - printk("%sBTRFS %s (device %s): %pV\n", lvl, type, sb->s_id, &vaf); + printk("%sBTRFS %s (device %s): %pV\n", lvl, type, + fs_info ? fs_info->sb->s_id : "", &vaf); va_end(args); } From c97df8e004de63e50384706d00adb590adcef054 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 10 Sep 2017 13:19:38 +0200 Subject: [PATCH 0487/1013] btrfs: tests: Fix a memory leak in error handling path in 'run_test()' [ Upstream commit 9ca2e97fa3c3216200afe35a3b111ec51cc796d2 ] If 'btrfs_alloc_path()' fails, we must free the resources already allocated, as done in the other error handling paths in this function. Signed-off-by: Christophe JAILLET Reviewed-by: Qu Wenruo Signed-off-by: David Sterba Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/tests/free-space-tree-tests.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/tests/free-space-tree-tests.c b/fs/btrfs/tests/free-space-tree-tests.c index 1458bb0ea124..8444a018cca2 100644 --- a/fs/btrfs/tests/free-space-tree-tests.c +++ b/fs/btrfs/tests/free-space-tree-tests.c @@ -500,7 +500,8 @@ static int run_test(test_func_t test_func, int bitmaps, u32 sectorsize, path = btrfs_alloc_path(); if (!path) { test_msg("Couldn't allocate path\n"); - return -ENOMEM; + ret = -ENOMEM; + goto out; } ret = add_block_group_free_space(&trans, root->fs_info, cache); From bca4d7e1a19cfdde2a4a340339e46248a691b196 Mon Sep 17 00:00:00 2001 From: Sergey Matyukevich Date: Mon, 30 Oct 2017 13:13:46 +0300 Subject: [PATCH 0488/1013] qtnfmac: modify full Tx queue error reporting [ Upstream commit e9931f984dd1e80adb3b5e095ef175fe383bc92d ] Under heavy load it is normal that h/w Tx queue is almost full all the time and reclaim should be done before transmitting next packet. Warning still should be reported as well as s/w Tx queues should be stopped in the case when reclaim failed. Signed-off-by: Sergey Matyukevich Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c b/drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c index 69131965a298..146e42a132e7 100644 --- a/drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c +++ b/drivers/net/wireless/quantenna/qtnfmac/pearl/pcie.c @@ -643,11 +643,11 @@ static int qtnf_tx_queue_ready(struct qtnf_pcie_bus_priv *priv) { if (!CIRC_SPACE(priv->tx_bd_w_index, priv->tx_bd_r_index, priv->tx_bd_num)) { - pr_err_ratelimited("reclaim full Tx queue\n"); qtnf_pcie_data_tx_reclaim(priv); if (!CIRC_SPACE(priv->tx_bd_w_index, priv->tx_bd_r_index, priv->tx_bd_num)) { + pr_warn_ratelimited("reclaim full Tx queue\n"); priv->tx_full_count++; return 0; } From 000a335b2d3828382bb4612ceee176a5cbd05752 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Thu, 26 Oct 2017 17:12:33 +0200 Subject: [PATCH 0489/1013] mtd: spi-nor: stm32-quadspi: Fix uninitialized error return code MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 05521bd3d117704a1458eb4d0c3ae821858658f2 ] With gcc 4.1.2: drivers/mtd/spi-nor/stm32-quadspi.c: In function ‘stm32_qspi_tx_poll’: drivers/mtd/spi-nor/stm32-quadspi.c:230: warning: ‘ret’ may be used uninitialized in this function Indeed, if stm32_qspi_cmd.len is zero, ret will be uninitialized. This length is passed from outside the driver using the spi_nor.{read,write}{,_reg}() callbacks. Several functions in drivers/mtd/spi-nor/spi-nor.c (e.g. write_enable(), write_disable(), and erase_chip()) call spi_nor.write_reg() with a zero length. Fix this by returning an explicit zero on success. Fixes: 0d43d7ab277a048c ("mtd: spi-nor: add driver for STM32 quad spi flash controller") Signed-off-by: Geert Uytterhoeven Acked-by: Ludovic Barre Signed-off-by: Cyrille Pitchen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/spi-nor/stm32-quadspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/mtd/spi-nor/stm32-quadspi.c b/drivers/mtd/spi-nor/stm32-quadspi.c index 86c0931543c5..ad6a3e1844cb 100644 --- a/drivers/mtd/spi-nor/stm32-quadspi.c +++ b/drivers/mtd/spi-nor/stm32-quadspi.c @@ -240,12 +240,12 @@ static int stm32_qspi_tx_poll(struct stm32_qspi *qspi, STM32_QSPI_FIFO_TIMEOUT_US); if (ret) { dev_err(qspi->dev, "fifo timeout (stat:%#x)\n", sr); - break; + return ret; } tx_fifo(buf++, qspi->io_base + QUADSPI_DR); } - return ret; + return 0; } static int stm32_qspi_tx_mm(struct stm32_qspi *qspi, From a54d17dba2ba4a1eaab65fd282f4b4c9bf80418b Mon Sep 17 00:00:00 2001 From: Neil Armstrong Date: Thu, 19 Oct 2017 12:31:09 +0200 Subject: [PATCH 0490/1013] ARM64: dts: meson-gxbb-odroidc2: fix usb1 power supply [ Upstream commit e841ec956e539f4002f5e9fe9f9e904dcca12d5d ] Looking at the schematics, the USB Power Supply is shared between the two USB interfaces, If the usb0 fails to initialize, the second one won't have power. Fixes: 5a0803bd5ae2 ("ARM64: dts: meson-gxbb-odroidc2: Enable USB Nodes") Signed-off-by: Neil Armstrong Signed-off-by: Kevin Hilman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts index 1ffa1c238a72..08b7bb7f5b74 100644 --- a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts +++ b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts @@ -301,6 +301,7 @@ &usb1_phy { status = "okay"; + phy-supply = <&usb_otg_pwr>; }; &usb0 { From c0f98c0dbced1c1b48fd4427075f8a5295b402fb Mon Sep 17 00:00:00 2001 From: Bartosz Chronowski Date: Thu, 26 Oct 2017 10:22:43 +0200 Subject: [PATCH 0491/1013] Bluetooth: btusb: Add new NFA344A entry. [ Upstream commit 858ff38af77fc660092e82474ecc6ac135ed29fe ] This change allows proper low power mode entry in suspend. /sys/kernel/debug/usb/devices entry: T: Bus=01 Lev=01 Prnt=01 Port=05 Cnt=03 Dev#= 3 Spd=12 MxCh= 0 D: Ver= 2.01 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=0489 ProdID=e09f Rev= 0.01 C:* #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=1ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=03(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms Signed-off-by: Bartosz Chronowski Signed-off-by: Marcel Holtmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/bluetooth/btusb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 7a5c06aaa181..513a7a59d421 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -272,6 +272,7 @@ static const struct usb_device_id blacklist_table[] = { { USB_DEVICE(0x0cf3, 0xe301), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0cf3, 0xe360), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0489, 0xe092), .driver_info = BTUSB_QCA_ROME }, + { USB_DEVICE(0x0489, 0xe09f), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x0489, 0xe0a2), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x04ca, 0x3011), .driver_info = BTUSB_QCA_ROME }, { USB_DEVICE(0x04ca, 0x3016), .driver_info = BTUSB_QCA_ROME }, From 06f430379273700696eef22159117789853a38e3 Mon Sep 17 00:00:00 2001 From: Tushar Dave Date: Fri, 27 Oct 2017 16:12:30 -0700 Subject: [PATCH 0492/1013] samples/bpf: adjust rlimit RLIMIT_MEMLOCK for xdp1 [ Upstream commit 6dfca831c03ef654b1f7bff1b8d487d330e9f76b ] Default rlimit RLIMIT_MEMLOCK is 64KB, causes bpf map failure. e.g. [root@lab bpf]#./xdp1 -N $( Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- samples/bpf/xdp1_user.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c index 2431c0321b71..fdaefe91801d 100644 --- a/samples/bpf/xdp1_user.c +++ b/samples/bpf/xdp1_user.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "bpf_load.h" #include "bpf_util.h" @@ -69,6 +70,7 @@ static void usage(const char *prog) int main(int argc, char **argv) { + struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY}; const char *optstr = "SN"; char filename[256]; int opt; @@ -91,6 +93,12 @@ int main(int argc, char **argv) usage(basename(argv[0])); return 1; } + + if (setrlimit(RLIMIT_MEMLOCK, &r)) { + perror("setrlimit(RLIMIT_MEMLOCK)"); + return 1; + } + ifindex = strtoul(argv[optind], NULL, 0); snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]); From 6bd7a9bc48ac642cb3ded1e40fcfee44c8a0aa1a Mon Sep 17 00:00:00 2001 From: Felix Manlunas Date: Thu, 26 Oct 2017 16:46:36 -0700 Subject: [PATCH 0493/1013] liquidio: fix kernel panic in VF driver [ Upstream commit aa28667cfbe4ff6f14454dda210b1f2e485f99b5 ] Doing ifconfig down on VF driver in the middle of receiving line rate traffic causes a kernel panic: LiquidIO_VF 0000:02:00.3: should not come here should not get rx when poll mode = 0 for vf BUG: unable to handle kernel NULL pointer dereference at (null) . . . Call Trace: ? tasklet_action+0x102/0x120 __do_softirq+0x91/0x292 irq_exit+0xb6/0xc0 do_IRQ+0x4f/0xd0 common_interrupt+0x93/0x93 RIP: 0010:cpuidle_enter_state+0x142/0x2f0 RSP: 0018:ffffffffa6403e20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff59 RAX: 0000000000000000 RBX: 0000000000000003 RCX: 000000000000001f RDX: 0000000000000000 RSI: 000000002ab7519f RDI: 0000000000000000 RBP: ffffffffa6403e58 R08: 0000000000000084 R09: 0000000000000018 R10: ffffffffa6403df0 R11: 00000000000003c7 R12: 0000000000000003 R13: ffffd27ebd806800 R14: ffffffffa64d40d8 R15: 0000007be072823f cpuidle_enter+0x17/0x20 call_cpuidle+0x23/0x40 do_idle+0x18c/0x1f0 cpu_startup_entry+0x64/0x70 rest_init+0xa5/0xb0 start_kernel+0x45e/0x46b x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x6f/0x72 secondary_startup_64+0xa5/0xa5 Code: Bad RIP value. RIP: (null) RSP: ffff9246ed003f28 CR2: 0000000000000000 ---[ end trace 92731e80f31b7d7d ]--- Kernel panic - not syncing: Fatal exception in interrupt Kernel Offset: 0x24000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt Reason is: in the function assigned to net_device_ops->ndo_stop, the steps for bringing down the interface are done in the wrong order. The step that notifies the NIC firmware to stop forwarding packets to host is done too late. Fix it by moving that step to the beginning. Signed-off-by: Felix Manlunas Signed-off-by: Raghu Vatsavayi Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c index 2e993ce43b66..4d2db22e011b 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c @@ -1289,6 +1289,9 @@ static int liquidio_stop(struct net_device *netdev) struct octeon_device *oct = lio->oct_dev; struct napi_struct *napi, *n; + /* tell Octeon to stop forwarding packets to host */ + send_rx_ctrl_cmd(lio, 0); + if (oct->props[lio->ifidx].napi_enabled) { list_for_each_entry_safe(napi, n, &netdev->napi_list, dev_list) napi_disable(napi); @@ -1306,9 +1309,6 @@ static int liquidio_stop(struct net_device *netdev) netif_carrier_off(netdev); lio->link_changes++; - /* tell Octeon to stop forwarding packets to host */ - send_rx_ctrl_cmd(lio, 0); - ifstate_reset(lio, LIO_IFSTATE_RUNNING); txqs_stop(netdev); From 21cd9fe750941100b7b4643ffd765215b6daff64 Mon Sep 17 00:00:00 2001 From: Osama Khan Date: Sat, 21 Oct 2017 10:42:21 +0000 Subject: [PATCH 0494/1013] platform/x86: hp_accel: Add quirk for HP ProBook 440 G4 [ Upstream commit 163ca80013aafb6dc9cb295de3db7aeab9ab43f8 ] Added support for HP ProBook 440 G4 laptops by including the accelerometer orientation quirk for that device. Testing was performed based on the axis orientation guidelines here: https://www.kernel.org/doc/Documentation/misc-devices/lis3lv02d which states "If the left side is elevated, X increases (becomes positive)". When tested, on lifting the left edge, x values became increasingly negative thus indicating an inverted x-axis on the installed lis3lv02d chip. This was compensated by adding an entry for this device in hp_accel.c specifying the quirk as x_inverted. The patch was tested on a ProBook 440 G4 device and x-axis as well as y and z-axis values are now generated as per spec. Signed-off-by: Osama Khan Signed-off-by: Andy Shevchenko Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/platform/x86/hp_accel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/platform/x86/hp_accel.c b/drivers/platform/x86/hp_accel.c index 493d8910a74e..7b12abe86b94 100644 --- a/drivers/platform/x86/hp_accel.c +++ b/drivers/platform/x86/hp_accel.c @@ -240,6 +240,7 @@ static const struct dmi_system_id lis3lv02d_dmi_ids[] = { AXIS_DMI_MATCH("HDX18", "HP HDX 18", x_inverted), AXIS_DMI_MATCH("HPB432x", "HP ProBook 432", xy_rotated_left), AXIS_DMI_MATCH("HPB440G3", "HP ProBook 440 G3", x_inverted_usd), + AXIS_DMI_MATCH("HPB440G4", "HP ProBook 440 G4", x_inverted), AXIS_DMI_MATCH("HPB442x", "HP ProBook 442", xy_rotated_left), AXIS_DMI_MATCH("HPB452x", "HP ProBook 452", y_inverted), AXIS_DMI_MATCH("HPB522x", "HP ProBook 522", xy_swap), From 4edcbfc5651276a28a1501e03198f6f16974617c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 18 Oct 2017 13:20:01 +0200 Subject: [PATCH 0495/1013] nvme: use kref_get_unless_zero in nvme_find_get_ns [ Upstream commit 2dd4122854f697afc777582d18548dded03ce5dd ] For kref_get_unless_zero to protect against lookup vs free races we need to use it in all places where we aren't guaranteed to already hold a reference. There is no such guarantee in nvme_find_get_ns, so switch to kref_get_unless_zero in this function. Signed-off-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Hannes Reinecke Reviewed-by: Johannes Thumshirn Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/nvme/host/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 37f9039bb9ca..0655f45643d9 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2299,7 +2299,8 @@ static struct nvme_ns *nvme_find_get_ns(struct nvme_ctrl *ctrl, unsigned nsid) mutex_lock(&ctrl->namespaces_mutex); list_for_each_entry(ns, &ctrl->namespaces, list) { if (ns->ns_id == nsid) { - kref_get(&ns->kref); + if (!kref_get_unless_zero(&ns->kref)) + continue; ret = ns; break; } From ff62605e0d79c870f679bcd8571bf9cb5f7e1c93 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Wed, 25 Oct 2017 15:57:55 +0200 Subject: [PATCH 0496/1013] l2tp: cleanup l2tp_tunnel_delete calls [ Upstream commit 4dc12ffeaeac939097a3f55c881d3dc3523dff0c ] l2tp_tunnel_delete does not return anything since commit 62b982eeb458 ("l2tp: fix race condition in l2tp_tunnel_delete"). But call sites of l2tp_tunnel_delete still do casts to void to avoid unused return value warnings. Kill these now useless casts. Signed-off-by: Jiri Slaby Cc: Sabrina Dubroca Cc: Guillaume Nault Cc: David S. Miller Cc: netdev@vger.kernel.org Acked-by: Guillaume Nault Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/l2tp/l2tp_core.c | 2 +- net/l2tp/l2tp_netlink.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c index 02d61101b108..af22aa8ae35b 100644 --- a/net/l2tp/l2tp_core.c +++ b/net/l2tp/l2tp_core.c @@ -1891,7 +1891,7 @@ static __net_exit void l2tp_exit_net(struct net *net) rcu_read_lock_bh(); list_for_each_entry_rcu(tunnel, &pn->l2tp_tunnel_list, list) { - (void)l2tp_tunnel_delete(tunnel); + l2tp_tunnel_delete(tunnel); } rcu_read_unlock_bh(); diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c index 7135f4645d3a..c28223d8092b 100644 --- a/net/l2tp/l2tp_netlink.c +++ b/net/l2tp/l2tp_netlink.c @@ -282,7 +282,7 @@ static int l2tp_nl_cmd_tunnel_delete(struct sk_buff *skb, struct genl_info *info l2tp_tunnel_notify(&l2tp_nl_family, info, tunnel, L2TP_CMD_TUNNEL_DELETE); - (void) l2tp_tunnel_delete(tunnel); + l2tp_tunnel_delete(tunnel); l2tp_tunnel_dec_refcount(tunnel); From 742b570da6e6447e4a98acceb27e8a7d5b436d7c Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Thu, 26 Oct 2017 09:31:16 -0700 Subject: [PATCH 0497/1013] xfs: fix log block underflow during recovery cycle verification [ Upstream commit 9f2a4505800607e537e9dd9dea4f55c4b0c30c7a ] It is possible for mkfs to format very small filesystems with too small of an internal log with respect to the various minimum size and block count requirements. If this occurs when the log happens to be smaller than the scan window used for cycle verification and the scan wraps the end of the log, the start_blk calculation in xlog_find_head() underflows and leads to an attempt to scan an invalid range of log blocks. This results in log recovery failure and a failed mount. Since there may be filesystems out in the wild with this kind of geometry, we cannot simply refuse to mount. Instead, cap the scan window for cycle verification to the size of the physical log. This ensures that the cycle verification proceeds as expected when the scan wraps the end of the log. Reported-by: Zorro Lang Signed-off-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/xfs/xfs_log_recover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index ee34899396b2..d6e049fdd977 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -753,7 +753,7 @@ xlog_find_head( * in the in-core log. The following number can be made tighter if * we actually look at the block size of the filesystem. */ - num_scan_bblks = XLOG_TOTAL_REC_SHIFT(log); + num_scan_bblks = min_t(int, log_bbnum, XLOG_TOTAL_REC_SHIFT(log)); if (head_blk >= num_scan_bblks) { /* * We are guaranteed that the entire check can be performed From 61bc71d34a00801657591e142f374794b6dc3548 Mon Sep 17 00:00:00 2001 From: "Darrick J. Wong" Date: Tue, 17 Oct 2017 21:37:32 -0700 Subject: [PATCH 0498/1013] xfs: return a distinct error code value for IGET_INCORE cache misses [ Upstream commit ed438b476b611c67089760037139f93ea8ed41d5 ] For an XFS_IGET_INCORE iget operation, if the inode isn't in the cache, return ENODATA so that we don't confuse it with the pre-existing ENOENT cases (inode is in cache, but freed). Signed-off-by: Darrick J. Wong Reviewed-by: Brian Foster Reviewed-by: Dave Chinner Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/xfs/xfs_icache.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 34227115a5d6..43005fbe8b1e 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -610,7 +610,7 @@ xfs_iget( } else { rcu_read_unlock(); if (flags & XFS_IGET_INCORE) { - error = -ENOENT; + error = -ENODATA; goto out_error_or_again; } XFS_STATS_INC(mp, xs_ig_missed); From 4be2d1ad59d0d9251bd1fc618a6d1266b16e3b75 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 17 Oct 2017 14:16:19 -0700 Subject: [PATCH 0499/1013] xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real [ Upstream commit 5e422f5e4fd71d18bc6b851eeb3864477b3d842e ] There was one spot in xfs_bmap_add_extent_unwritten_real that didn't use the passed in new extent state but always converted to normal, leading to wrong behavior when converting from normal to unwritten. Only found by code inspection, it seems like this code path to move partial extent from written to unwritten while merging it with the next extent is rarely exercised. Signed-off-by: Christoph Hellwig Reviewed-by: Brian Foster Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/xfs/libxfs/xfs_bmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 89263797cf32..a3cc8afed367 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -2560,7 +2560,7 @@ xfs_bmap_add_extent_unwritten_real( &i))) goto done; XFS_WANT_CORRUPTED_GOTO(mp, i == 0, done); - cur->bc_rec.b.br_state = XFS_EXT_NORM; + cur->bc_rec.b.br_state = new->br_state; if ((error = xfs_btree_insert(cur, &i))) goto done; XFS_WANT_CORRUPTED_GOTO(mp, i == 1, done); From 68a6765b70d90debde63bd26599c3422116751ef Mon Sep 17 00:00:00 2001 From: Egil Hjelmeland Date: Tue, 24 Oct 2017 17:14:10 +0200 Subject: [PATCH 0500/1013] net: dsa: lan9303: Do not disable switch fabric port 0 at .probe [ Upstream commit 3c91b0c1de8d013490bbc41ce9ee8810ea5baddd ] Make the LAN9303 work when lan9303_probe() is called twice. For some unknown reason the LAN9303 switch fail to forward data when switch fabric port 0 TX is disabled during probe. (Write of LAN9303_MAC_TX_CFG_0 in lan9303_disable_processing_port().) In that situation the switch fabric seem to receive frames, because the ALR is learning addresses. But no frames are transmitted on any of the ports. In our system lan9303_probe() is called twice, first time dsa_register_switch() return -EPROBE_DEFER. As an experiment, modified the code to skip writing LAN9303_MAC_TX_CFG_0, port 0 during the first probe. Then the switch works as expected. Resolve the problem by not calling lan9303_disable_processing_port() on port 0 during probe. Ports 1 and 2 are still disabled. Although unsatisfying that the exact failure mechanism is not known, the patch should not cause any harm. Signed-off-by: Egil Hjelmeland Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/dsa/lan9303-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c index b471413d3df9..1e5a69b9d90a 100644 --- a/drivers/net/dsa/lan9303-core.c +++ b/drivers/net/dsa/lan9303-core.c @@ -569,7 +569,7 @@ static int lan9303_disable_processing(struct lan9303 *chip) { int p; - for (p = 0; p < LAN9303_NUM_PORTS; p++) { + for (p = 1; p < LAN9303_NUM_PORTS; p++) { int ret = lan9303_disable_processing_port(chip, p); if (ret) From 0949f8afa8f1b9d6ebce48606e41dfcdafbaca4a Mon Sep 17 00:00:00 2001 From: Lipeng Date: Tue, 24 Oct 2017 21:02:11 +0800 Subject: [PATCH 0501/1013] net: hns3: fix a bug in hclge_uninit_client_instance [ Upstream commit a17dcf3f0124698d1120da71574bf4c339e5a368 ] HNS3 driver initialize hdev->roce_client and vport->roce.client in hclge_init_client_instance, and need set hdev->roce_client and vport->roce.client NULL. If do not set them NULL when uninit, it will fail in the scene: insmod hns3.ko, hns-roce.ko, hns-roce-hw-v3.ko successfully, but rmmod hns3.ko after rmmod hns-roce-hw-v2.ko and hns-roce.ko. This patch fixes the issue. Fixes: 46a3df9 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support) Signed-off-by: Lipeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- .../net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index c1cdbfd83bdb..1df84a874d95 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -4007,13 +4007,19 @@ static void hclge_uninit_client_instance(struct hnae3_client *client, for (i = 0; i < hdev->num_vmdq_vport + 1; i++) { vport = &hdev->vport[i]; - if (hdev->roce_client) + if (hdev->roce_client) { hdev->roce_client->ops->uninit_instance(&vport->roce, 0); + hdev->roce_client = NULL; + vport->roce.client = NULL; + } if (client->type == HNAE3_CLIENT_ROCE) return; - if (client->ops->uninit_instance) + if (client->ops->uninit_instance) { client->ops->uninit_instance(&vport->nic, 0); + hdev->nic_client = NULL; + vport->nic.client = NULL; + } } } From 1d9205558e50b7af5fe5b054bf26f93c9255786e Mon Sep 17 00:00:00 2001 From: Lipeng Date: Tue, 24 Oct 2017 21:02:10 +0800 Subject: [PATCH 0502/1013] net: hns3: add nic_client check when initialize roce base information [ Upstream commit 3a46f34d20d453f09defb76b11a567647939c0aa ] Roce driver works base on HNS3 driver.If insmod Roce driver before NIC driver there is a error because do not check nic_client. This patch adds nic_client check when initialize roce base information. Fixes: 46a3df9 (net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support) Signed-off-by: Lipeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 1df84a874d95..a0ef97e7f3c9 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3981,7 +3981,7 @@ static int hclge_init_client_instance(struct hnae3_client *client, vport->roce.client = client; } - if (hdev->roce_client) { + if (hdev->roce_client && hdev->nic_client) { ret = hclge_init_roce_base_info(vport); if (ret) goto err; From 9d48d002a2f68e35bea2872a5533436848f30a60 Mon Sep 17 00:00:00 2001 From: Lipeng Date: Tue, 24 Oct 2017 21:02:09 +0800 Subject: [PATCH 0503/1013] net: hns3: fix the bug of hns3_set_txbd_baseinfo [ Upstream commit 7036d26f328f12a323069eb16d965055b4cb3795 ] The SC bits of TX BD mean switch control. For this area, value 0 indicates no switch control, the packet is routed according to the forwarding table. Value 1 indicates that the packet is transmitted to the network bypassing the forwarding table. As HNS3 driver need support VF later, VF conmunicate with its own PF need forwarding table. This patch sets SC bits of TX BD 0 and use forwarding table. Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC) Signed-off-by: Lipeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c index 981fc62ce043..52687f0691f1 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c @@ -721,7 +721,7 @@ static void hns3_set_txbd_baseinfo(u16 *bdtp_fe_sc_vld_ra_ri, int frag_end) HNS3_TXD_BDTYPE_M, 0); hnae_set_bit(*bdtp_fe_sc_vld_ra_ri, HNS3_TXD_FE_B, !!frag_end); hnae_set_bit(*bdtp_fe_sc_vld_ra_ri, HNS3_TXD_VLD_B, 1); - hnae_set_field(*bdtp_fe_sc_vld_ra_ri, HNS3_TXD_SC_M, HNS3_TXD_SC_S, 1); + hnae_set_field(*bdtp_fe_sc_vld_ra_ri, HNS3_TXD_SC_M, HNS3_TXD_SC_S, 0); } static int hns3_fill_desc(struct hns3_enet_ring *ring, void *priv, From 18c8d94eac0fd6c859033b2eb05941f19c3f7b4a Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 25 Oct 2017 07:41:11 +0300 Subject: [PATCH 0504/1013] RDMA/cxgb4: Declare stag as __be32 [ Upstream commit 35fb2a88ed4b77356fa679a8525c869a3594e287 ] The scqe.stag is actually __b32, fix it. drivers/infiniband/hw/cxgb4/cq.c:754:52: warning: cast to restricted __be32 Cc: Steve Wise Signed-off-by: Leon Romanovsky Reviewed-by: Steve Wise Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/cxgb4/t4.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/cxgb4/t4.h b/drivers/infiniband/hw/cxgb4/t4.h index e765c00303cd..bcb80ca67d3d 100644 --- a/drivers/infiniband/hw/cxgb4/t4.h +++ b/drivers/infiniband/hw/cxgb4/t4.h @@ -171,7 +171,7 @@ struct t4_cqe { __be32 msn; } rcqe; struct { - u32 stag; + __be32 stag; u16 nada2; u16 cidx; } scqe; From 96ed7ca7323be977d617d077146b322e844bdcb9 Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Wed, 11 Oct 2017 15:35:56 -0600 Subject: [PATCH 0505/1013] PCI: Detach driver before procfs & sysfs teardown on device remove [ Upstream commit 16b6c8bb687cc3bec914de09061fcb8411951fda ] When removing a device, for example a VF being removed due to SR-IOV teardown, a "soft" hot-unplug via 'echo 1 > remove' in sysfs, or an actual hot-unplug, we first remove the procfs and sysfs attributes for the device before attempting to release the device from any driver bound to it. Unbinding the driver from the device can take time. The device might need to write out data or it might be actively in use. If it's in use by userspace through a vfio driver, the unbind might block until the user releases the device. This leads to a potentially non-trivial amount of time where the device exists, but we've torn down the interfaces that userspace uses to examine devices, for instance lspci might generate this sort of error: pcilib: Cannot open /sys/bus/pci/devices/0000:01:0a.3/config lspci: Unable to read the standard configuration space header of device 0000:01:0a.3 We don't seem to have any dependence on this teardown ordering in the kernel, so let's unbind the driver first, which is also more symmetric with the instantiation of the device in pci_bus_add_device(). Signed-off-by: Alex Williamson Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/remove.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c index 73a03d382590..2fa0dbde36b7 100644 --- a/drivers/pci/remove.c +++ b/drivers/pci/remove.c @@ -19,9 +19,9 @@ static void pci_stop_dev(struct pci_dev *dev) pci_pme_active(dev, false); if (dev->is_added) { + device_release_driver(&dev->dev); pci_proc_detach_device(dev); pci_remove_sysfs_dev_files(dev); - device_release_driver(&dev->dev); dev->is_added = 0; } From b61aba91eb72efc5117d24d3f4455dcaf67f1e63 Mon Sep 17 00:00:00 2001 From: Xiaofei Tan Date: Tue, 24 Oct 2017 23:51:38 +0800 Subject: [PATCH 0506/1013] scsi: hisi_sas: fix the risk of freeing slot twice [ Upstream commit 6ba0fbc35aa9f3bc8c12be3b4047055c9ce2ac92 ] The function hisi_sas_slot_task_free() is used to free the slot and do tidy-up of LLDD resources. The LLDD generally should know the state of a slot and decide when to free it, and it should only be done once. For some scenarios, we really don't know the state, like when TMF timeout. In this case, we check task->lldd_task before calling hisi_sas_slot_task_free(). However, we may miss some scenarios when we should also check task->lldd_task, and it is not SMP safe to check task->lldd_task as we don't protect it within spin lock. This patch is to fix this risk of freeing slot twice, as follows: 1. Check task->lldd_task in the hisi_sas_slot_task_free(), and give up freeing of this time if task->lldd_task is NULL. 2. Set slot->buf to NULL after it is freed. Signed-off-by: Xiaofei Tan Signed-off-by: John Garry Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/hisi_sas/hisi_sas_main.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/scsi/hisi_sas/hisi_sas_main.c b/drivers/scsi/hisi_sas/hisi_sas_main.c index 16664f2e15fb..8fa9bb336ad4 100644 --- a/drivers/scsi/hisi_sas/hisi_sas_main.c +++ b/drivers/scsi/hisi_sas/hisi_sas_main.c @@ -185,13 +185,16 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task, struct domain_device *device = task->dev; struct hisi_sas_device *sas_dev = device->lldd_dev; + if (!task->lldd_task) + return; + + task->lldd_task = NULL; + if (!sas_protocol_ata(task->task_proto)) if (slot->n_elem) dma_unmap_sg(dev, task->scatter, slot->n_elem, task->data_dir); - task->lldd_task = NULL; - if (sas_dev) atomic64_dec(&sas_dev->running_req); } @@ -199,8 +202,8 @@ void hisi_sas_slot_task_free(struct hisi_hba *hisi_hba, struct sas_task *task, if (slot->buf) dma_pool_free(hisi_hba->buffer_pool, slot->buf, slot->buf_dma); - list_del_init(&slot->entry); + slot->buf = NULL; slot->task = NULL; slot->port = NULL; hisi_sas_slot_index_free(hisi_hba, slot->idx); From 7a7797199d9b96989ab5f30eadad077a8dc7c970 Mon Sep 17 00:00:00 2001 From: Martin Wilck Date: Fri, 20 Oct 2017 16:51:14 -0500 Subject: [PATCH 0507/1013] scsi: hpsa: cleanup sas_phy structures in sysfs when unloading [ Upstream commit 55ca38b4255bb336c2d35990bdb2b368e19b435a ] I am resubmitting this patch on behalf of Martin Wilck with his permission. The original patch can be found here: https://www.spinics.net/lists/linux-scsi/msg102083.html This patch did not help until Hannes's commit 9441284fbc39 ("scsi-fixup-kernel-warning-during-rmmod") was applied to the kernel. -------------------------------------- Original patch description from Martin: -------------------------------------- When the hpsa module is unloaded using rmmod, dangling symlinks remain under /sys/class/sas_phy. Fix this by calling sas_phy_delete() rather than sas_phy_free (which, according to comments, should not be called for PHYs that have been set up successfully, anyway). Tested-by: Don Brace Reviewed-by: Don Brace Signed-off-by: Martin Wilck Signed-off-by: Don Brace Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/hpsa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 4ed3d26ffdde..832bc83fcd8e 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -9207,9 +9207,9 @@ static void hpsa_free_sas_phy(struct hpsa_sas_phy *hpsa_sas_phy) struct sas_phy *phy = hpsa_sas_phy->phy; sas_port_delete_phy(hpsa_sas_phy->parent_port->port, phy); - sas_phy_free(phy); if (hpsa_sas_phy->added_to_port) list_del(&hpsa_sas_phy->phy_list_entry); + sas_phy_delete(phy); kfree(hpsa_sas_phy); } From 8268e93b2b879912ae84952acef17e4c78f3b66d Mon Sep 17 00:00:00 2001 From: Martin Wilck Date: Fri, 20 Oct 2017 16:51:08 -0500 Subject: [PATCH 0508/1013] scsi: hpsa: destroy sas transport properties before scsi_host [ Upstream commit dfb2e6f46b3074eb85203d8f0888b71ec1c2e37a ] This patch cleans up a lot of warnings when unloading the driver. A current example of the stack trace starts with: [ 142.570715] sysfs group 'power' not found for kobject 'port-5:0' There can be hundreds of these messages during a driver unload. I am resubmitting this patch on behalf of Martin Wilck with his permission. His original patch can be found here: https://www.spinics.net/lists/linux-scsi/msg102085.html This patch did not help until Hannes's commit 9441284fbc39 ("scsi-fixup-kernel-warning-during-rmmod") was applied to the kernel. --------------------------- Original patch description: --------------------------- Unloading the hpsa driver causes warnings [ 1063.793652] WARNING: CPU: 1 PID: 4850 at ../fs/sysfs/group.c:237 device_del+0x54/0x240() [ 1063.793659] sysfs group ffffffff81cf21a0 not found for kobject 'port-2:0' with two different stacks: 1) [ 1063.793774] [] device_del+0x54/0x240 [ 1063.793780] [] transport_remove_classdev+0x4a/0x60 [ 1063.793784] [] attribute_container_device_trigger+0xa6/0xb0 [ 1063.793802] [] sas_port_delete+0x126/0x160 [scsi_transport_sas] [ 1063.793819] [] hpsa_free_sas_port+0x3c/0x70 [hpsa] 2) [ 1063.797103] [] device_del+0x54/0x240 [ 1063.797118] [] sas_port_delete+0x12e/0x160 [scsi_transport_sas] [ 1063.797134] [] hpsa_free_sas_port+0x3c/0x70 [hpsa] This is caused by the fact that host device hostX is deleted before the SAS transport devices hostX/port-a:b. This patch fixes this by reverting the order of device deletions. Tested-by: Don Brace Reviewed-by: Don Brace Signed-off-by: Martin Wilck Signed-off-by: Don Brace Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/hpsa.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/hpsa.c b/drivers/scsi/hpsa.c index 832bc83fcd8e..5fbaf13781b6 100644 --- a/drivers/scsi/hpsa.c +++ b/drivers/scsi/hpsa.c @@ -8684,6 +8684,8 @@ static void hpsa_remove_one(struct pci_dev *pdev) destroy_workqueue(h->rescan_ctlr_wq); destroy_workqueue(h->resubmit_wq); + hpsa_delete_sas_host(h); + /* * Call before disabling interrupts. * scsi_remove_host can trigger I/O operations especially @@ -8718,8 +8720,6 @@ static void hpsa_remove_one(struct pci_dev *pdev) h->lockup_detected = NULL; /* init_one 2 */ /* (void) pci_disable_pcie_error_reporting(pdev); */ /* init_one 1 */ - hpsa_delete_sas_host(h); - kfree(h); /* init_one 1 */ } From aeb01451f9e2d3e9b2ac2ad30426c68957c307ed Mon Sep 17 00:00:00 2001 From: Alexey Khoroshilov Date: Sat, 14 Oct 2017 01:06:56 +0300 Subject: [PATCH 0509/1013] mfd: mxs-lradc: Fix error handling in mxs_lradc_probe() [ Upstream commit 362741a21a5c4b9ee31e75ce28d63c6d238a745c ] There is the only path, where mxs_lradc_probe() leaves clk undisabled, since it does return instead of goto err_clk. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Alexey Khoroshilov Signed-off-by: Lee Jones Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/mxs-lradc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/mfd/mxs-lradc.c b/drivers/mfd/mxs-lradc.c index 630bd19b2c0a..98e732a7ae96 100644 --- a/drivers/mfd/mxs-lradc.c +++ b/drivers/mfd/mxs-lradc.c @@ -196,8 +196,10 @@ static int mxs_lradc_probe(struct platform_device *pdev) platform_set_drvdata(pdev, lradc); res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - if (!res) - return -ENOMEM; + if (!res) { + ret = -ENOMEM; + goto err_clk; + } switch (lradc->soc) { case IMX23_LRADC: From 9409e1c775f35b2571be427cde272ce80cdc19e7 Mon Sep 17 00:00:00 2001 From: Lipeng Date: Mon, 23 Oct 2017 19:51:05 +0800 Subject: [PATCH 0510/1013] net: hns3: fix the TX/RX ring.queue_index in hns3_ring_get_cfg [ Upstream commit 66b447301ac710ee237dba8b653244018fbb6168 ] The interface hns3_ring_get_cfg only update TX ring queue_index, but do not update RX ring queue_index. This patch fixes it. Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC) Signed-off-by: Lipeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c index 52687f0691f1..8944d836ac9a 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c @@ -2488,16 +2488,16 @@ static int hns3_ring_get_cfg(struct hnae3_queue *q, struct hns3_nic_priv *priv, if (ring_type == HNAE3_RING_TYPE_TX) { ring_data[q->tqp_index].ring = ring; + ring_data[q->tqp_index].queue_index = q->tqp_index; ring->io_base = (u8 __iomem *)q->io_base + HNS3_TX_REG_OFFSET; } else { ring_data[q->tqp_index + queue_num].ring = ring; + ring_data[q->tqp_index + queue_num].queue_index = q->tqp_index; ring->io_base = q->io_base; } hnae_set_bit(ring->flag, HNAE3_RING_TYPE_B, ring_type); - ring_data[q->tqp_index].queue_index = q->tqp_index; - ring->tqp = q; ring->desc = NULL; ring->desc_cb = NULL; From 122e18d27d77031b909a06f110f6da225502d033 Mon Sep 17 00:00:00 2001 From: Lipeng Date: Mon, 23 Oct 2017 19:51:02 +0800 Subject: [PATCH 0511/1013] net: hns3: fix the bug when map buffer fail [ Upstream commit 564883bb4dc1a4f3cba6344e77743175694b0761 ] If one buffer had been recieved to stack, driver will alloc a new buffer, map the buffer to device and replace the old buffer. When map fail, should only free the new alloced buffer, but not free all buffers in the ring. Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC) Signed-off-by: Lipeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c index 8944d836ac9a..ea121b0b16ac 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c @@ -1546,7 +1546,7 @@ static int hns3_reserve_buffer_map(struct hns3_enet_ring *ring, return 0; out_with_buf: - hns3_free_buffers(ring); + hns3_free_buffer(ring, cb); out: return ret; } From 669ff2a9aa71edea93a0b74c1273080cad68dfb7 Mon Sep 17 00:00:00 2001 From: Lipeng Date: Mon, 23 Oct 2017 19:51:01 +0800 Subject: [PATCH 0512/1013] net: hns3: fix a bug when alloc new buffer [ Upstream commit b9077428ec5569aacb2952d8a2ffb51c8988d3c2 ] When alloce new buffer to HW, should unmap the old buffer first. This old code map the old buffer but not unmap the old buffer, this patch fixes it. Fixes: 76ad4f0 (net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC) Signed-off-by: Lipeng Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c index ea121b0b16ac..186772493711 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c @@ -1586,7 +1586,7 @@ static int hns3_alloc_ring_buffers(struct hns3_enet_ring *ring) static void hns3_replace_buffer(struct hns3_enet_ring *ring, int i, struct hns3_desc_cb *res_cb) { - hns3_map_buffer(ring, &ring->desc_cb[i]); + hns3_unmap_buffer(ring, &ring->desc_cb[i]); ring->desc_cb[i] = *res_cb; ring->desc[i].addr = cpu_to_le64(ring->desc_cb[i].dma); } From b59dc14b0b102d42d6bd54ae77848ddc00a4596c Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 16 Oct 2017 15:06:19 +0200 Subject: [PATCH 0513/1013] serdev: ttyport: enforce tty-driver open() requirement [ Upstream commit dee7d0f3b200c67c6ee96bd37c6e8fa52690ab56 ] The tty-driver open routine is mandatory, but the serdev tty-port-controller implementation did not treat it as such and would instead fall back to calling tty_port_open() directly. Signed-off-by: Johan Hovold Acked-by: Rob Herring Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/tty/serdev/serdev-ttyport.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/drivers/tty/serdev/serdev-ttyport.c b/drivers/tty/serdev/serdev-ttyport.c index 7f785d77ba7f..69fc6d9ab490 100644 --- a/drivers/tty/serdev/serdev-ttyport.c +++ b/drivers/tty/serdev/serdev-ttyport.c @@ -120,10 +120,10 @@ static int ttyport_open(struct serdev_controller *ctrl) return PTR_ERR(tty); serport->tty = tty; - if (tty->ops->open) - tty->ops->open(serport->tty, NULL); - else - tty_port_open(serport->port, tty, NULL); + if (!tty->ops->open) + goto err_unlock; + + tty->ops->open(serport->tty, NULL); /* Bring the UART into a known 8 bits no parity hw fc state */ ktermios = tty->termios; @@ -140,6 +140,12 @@ static int ttyport_open(struct serdev_controller *ctrl) tty_unlock(serport->tty); return 0; + +err_unlock: + tty_unlock(tty); + tty_release_struct(tty, serport->tty_idx); + + return -ENODEV; } static void ttyport_close(struct serdev_controller *ctrl) From 8ee1eada4f1a6671517dc51f6bc11174faebea02 Mon Sep 17 00:00:00 2001 From: Michael Ellerman Date: Mon, 9 Oct 2017 21:52:44 +1100 Subject: [PATCH 0514/1013] powerpc/perf/hv-24x7: Fix incorrect comparison in memord [ Upstream commit 05c14c03138532a3cb2aa29c2960445c8753343b ] In the hv-24x7 code there is a function memord() which tries to implement a sort function return -1, 0, 1. However one of the conditions is incorrect, such that it can never be true, because we will have already returned. I don't believe there is a bug in practice though, because the comparisons are an optimisation prior to calling memcmp(). Fix it by swapping the second comparision, so it can be true. Reported-by: David Binderman Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/perf/hv-24x7.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index 9c88b82f6229..72238eedc360 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -540,7 +540,7 @@ static int memord(const void *d1, size_t s1, const void *d2, size_t s2) { if (s1 < s2) return 1; - if (s2 > s1) + if (s1 > s2) return -1; return memcmp(d1, d2, s1); From 9cd01922985ce5c00cf050f12daef304cc452761 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Tue, 17 Oct 2017 16:20:18 -0200 Subject: [PATCH 0515/1013] powerpc/xmon: Check before calling xive functions [ Upstream commit 402e172a2ce76210f2fe921cf419d12103851344 ] Currently xmon could call XIVE functions from OPAL even if the XIVE is disabled or does not exist in the system, as in POWER8 machines. This causes the following exception: 1:mon> dx cpu 0x1: Vector: 700 (Program Check) at [c000000423c93450] pc: c00000000009cfa4: opal_xive_dump+0x50/0x68 lr: c0000000000997b8: opal_return+0x0/0x50 This patch simply checks if XIVE is enabled before calling XIVE functions. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Suggested-by: Guilherme G. Piccoli Signed-off-by: Breno Leitao Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/xmon/xmon.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index 33351c6704b1..c008083fbc4f 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -2475,6 +2475,11 @@ static void dump_xives(void) unsigned long num; int c; + if (!xive_enabled()) { + printf("Xive disabled on this system\n"); + return; + } + c = inchar(); if (c == 'a') { dump_all_xives(); From 48185ffb6dc3eb64846039117d76148d3e3d7022 Mon Sep 17 00:00:00 2001 From: Matthias Brugger Date: Sat, 21 Oct 2017 10:17:47 +0200 Subject: [PATCH 0516/1013] soc: mediatek: pwrap: fix compiler errors [ Upstream commit fb2c1934f30577756e55e24e8870b45c78da3bc2 ] When compiling using sparse, we got the following error: drivers/soc/mediatek/mtk-pmic-wrap.c:686:25: error: dubious one-bit signed bitfield Changing the data type to unsigned fixes this. Signed-off-by: Matthias Brugger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/soc/mediatek/mtk-pmic-wrap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/soc/mediatek/mtk-pmic-wrap.c b/drivers/soc/mediatek/mtk-pmic-wrap.c index c2048382830f..e3df1e96b141 100644 --- a/drivers/soc/mediatek/mtk-pmic-wrap.c +++ b/drivers/soc/mediatek/mtk-pmic-wrap.c @@ -522,7 +522,7 @@ struct pmic_wrapper_type { u32 int_en_all; u32 spi_w; u32 wdt_src; - int has_bridge:1; + unsigned int has_bridge:1; int (*init_reg_clock)(struct pmic_wrapper *wrp); int (*init_soc_specific)(struct pmic_wrapper *wrp); }; From fdfcb06c5944afb78007cfc25e527ff689b2a344 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 18 Oct 2017 17:02:03 -0700 Subject: [PATCH 0517/1013] ipv4: ipv4_default_advmss() should use route mtu [ Upstream commit 164a5e7ad531e181334a3d3f03d0d5ad20d6faea ] ipv4_default_advmss() incorrectly uses the device MTU instead of the route provided one. IPv6 has the proper behavior, lets harmonize the two protocols. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/route.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 647cfc972bde..804bead564db 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1254,7 +1254,7 @@ static void set_class_tag(struct rtable *rt, u32 tag) static unsigned int ipv4_default_advmss(const struct dst_entry *dst) { unsigned int header_size = sizeof(struct tcphdr) + sizeof(struct iphdr); - unsigned int advmss = max_t(unsigned int, dst->dev->mtu - header_size, + unsigned int advmss = max_t(unsigned int, ipv4_mtu(dst) - header_size, ip_rt_min_advmss); return min(advmss, IPV4_MAX_PMTU - header_size); From b53af57679492180417d584be410b1134de15cd2 Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Thu, 19 Oct 2017 07:00:34 +0800 Subject: [PATCH 0518/1013] KVM: nVMX: Fix EPT switching advertising MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 575b3a2cb439b03fd603ea77c73c76f3ed237596 ] I can use vmxcap tool to observe "EPTP Switching yes" even if EPT is not exposed to L1. EPT switching is advertised unconditionally since it is emulated, however, it can be treated as an extended feature for EPT and it should not be advertised if EPT itself is not exposed. This patch fixes it. Reviewed-by: David Hildenbrand Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Jim Mattson Signed-off-by: Wanpeng Li Signed-off-by: Radim Krčmář Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/vmx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f366e6d3a5e1..bc5921c1e2f2 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2845,8 +2845,9 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx *vmx) * Advertise EPTP switching unconditionally * since we emulate it */ - vmx->nested.nested_vmx_vmfunc_controls = - VMX_VMFUNC_EPTP_SWITCHING; + if (enable_ept) + vmx->nested.nested_vmx_vmfunc_controls = + VMX_VMFUNC_EPTP_SWITCHING; } /* From 93fc8284472124140a7946c4e5f555e06808f4a2 Mon Sep 17 00:00:00 2001 From: nixiaoming Date: Fri, 15 Sep 2017 17:45:56 +0800 Subject: [PATCH 0519/1013] tty fix oops when rmmod 8250 [ Upstream commit c79dde629d2027ca80329c62854a7635e623d527 ] After rmmod 8250.ko tty_kref_put starts kwork (release_one_tty) to release proc interface oops when accessing driver->driver_name in proc_tty_unregister_driver Use jprobe, found driver->driver_name point to 8250.ko static static struct uart_driver serial8250_reg .driver_name= serial, Use name in proc_dir_entry instead of driver->driver_name to fix oops test on linux 4.1.12: BUG: unable to handle kernel paging request at ffffffffa01979de IP: [] strchr+0x0/0x30 PGD 1a0d067 PUD 1a0e063 PMD 851c1f067 PTE 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: ... ... [last unloaded: 8250] CPU: 7 PID: 116 Comm: kworker/7:1 Tainted: G O 4.1.12 #1 Hardware name: Insyde RiverForest/Type2 - Board Product Name1, BIOS NE5KV904 12/21/2015 Workqueue: events release_one_tty task: ffff88085b684960 ti: ffff880852884000 task.ti: ffff880852884000 RIP: 0010:[] [] strchr+0x0/0x30 RSP: 0018:ffff880852887c90 EFLAGS: 00010282 RAX: ffffffff81a5eca0 RBX: ffffffffa01979de RCX: 0000000000000004 RDX: ffff880852887d10 RSI: 000000000000002f RDI: ffffffffa01979de RBP: ffff880852887cd8 R08: 0000000000000000 R09: ffff88085f5d94d0 R10: 0000000000000195 R11: 0000000000000000 R12: ffffffffa01979de R13: ffff880852887d00 R14: ffffffffa01979de R15: ffff88085f02e840 FS: 0000000000000000(0000) GS:ffff88085f5c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa01979de CR3: 0000000001a0c000 CR4: 00000000001406e0 Stack: ffffffff812349b1 ffff880852887cb8 ffff880852887d10 ffff88085f5cd6c2 ffff880852800a80 ffffffffa01979de ffff880852800a84 0000000000000010 ffff88085bb28bd8 ffff880852887d38 ffffffff812354f0 ffff880852887d08 Call Trace: [] ? __xlate_proc_name+0x71/0xd0 [] remove_proc_entry+0x40/0x180 [] ? _raw_spin_lock_irqsave+0x41/0x60 [] ? destruct_tty_driver+0x60/0xe0 [] proc_tty_unregister_driver+0x28/0x40 [] destruct_tty_driver+0x88/0xe0 [] tty_driver_kref_put+0x1d/0x20 [] release_one_tty+0x5a/0xd0 [] process_one_work+0x139/0x420 [] worker_thread+0x121/0x450 [] ? process_scheduled_works+0x40/0x40 [] kthread+0xec/0x110 [] ? tg_rt_schedulable+0x210/0x220 [] ? kthread_freezable_should_stop+0x80/0x80 [] ret_from_fork+0x42/0x70 [] ? kthread_freezable_should_stop+0x80/0x80 Signed-off-by: nixiaoming Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/proc/proc_tty.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/proc/proc_tty.c b/fs/proc/proc_tty.c index 2da657848cfc..d0cf1c50bb6c 100644 --- a/fs/proc/proc_tty.c +++ b/fs/proc/proc_tty.c @@ -15,6 +15,7 @@ #include #include #include +#include "internal.h" /* * The /proc/tty directory inodes... @@ -165,7 +166,7 @@ void proc_tty_unregister_driver(struct tty_driver *driver) if (!ent) return; - remove_proc_entry(driver->driver_name, proc_tty_driver); + remove_proc_entry(ent->name, proc_tty_driver); driver->proc_entry = NULL; } From 4684da79d20edf232e33442775506e600fc4fe6f Mon Sep 17 00:00:00 2001 From: Ross Zwisler Date: Wed, 18 Oct 2017 12:21:55 -0600 Subject: [PATCH 0520/1013] dev/dax: fix uninitialized variable build warning [ Upstream commit 0a3ff78699d1817e711441715d22665475466036 ] Fix this build warning: warning: 'phys' may be used uninitialized in this function [-Wuninitialized] As reported here: https://lkml.org/lkml/2017/10/16/152 http://kisskb.ellerman.id.au/kisskb/buildresult/13181373/log/ Signed-off-by: Ross Zwisler Signed-off-by: Dan Williams Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/dax/device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 375b99bca002..7b0bf825c4e7 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -222,7 +222,8 @@ __weak phys_addr_t dax_pgoff_to_phys(struct dev_dax *dev_dax, pgoff_t pgoff, unsigned long size) { struct resource *res; - phys_addr_t phys; + /* gcc-4.6.3-nolibc for i386 complains that this is uninitialized */ + phys_addr_t uninitialized_var(phys); int i; for (i = 0; i < dev_dax->num_resources; i++) { From 57db94d4b72479c6c1e31d18e83a581d5d6d0ee9 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 11 Oct 2017 11:57:15 +0200 Subject: [PATCH 0521/1013] pinctrl: adi2: Fix Kconfig build problem [ Upstream commit 1c363531dd814dc4fe10865722bf6b0f72ce4673 ] The build robot is complaining on Blackfin: drivers/pinctrl/pinctrl-adi2.c: In function 'port_setup': >> drivers/pinctrl/pinctrl-adi2.c:221:21: error: dereferencing pointer to incomplete type 'struct gpio_port_t' writew(readw(®s->port_fer) & ~BIT(offset), ^~ drivers/pinctrl/pinctrl-adi2.c: In function 'adi_gpio_ack_irq': >> drivers/pinctrl/pinctrl-adi2.c:266:18: error: dereferencing pointer to incomplete type 'struct bfin_pint_regs' if (readl(®s->invert_set) & pintbit) ^~ It seems the driver need to include and to compile. The Blackfin architecture was re-defining the Kconfig PINCTRL symbol which is not OK, so replaced this with PINCTRL_BLACKFIN_ADI2 which selects PINCTRL and PINCTRL_ADI2 just like most arches do. Further, the old GPIO driver symbol GPIO_ADI was possible to select at the same time as selecting PINCTRL. This was not working because the arch-local header contains an explicit #ifndef PINCTRL clause making compilation break if you combine them. The same is true for DEBUG_MMRS. Make sure the ADI2 pinctrl driver is not selected at the same time as the old GPIO implementation. (This should be converted to use gpiolib or pincontrol and move to drivers/...) Also make sure the old GPIO_ADI driver or DEBUG_MMRS is not selected at the same time as the new PINCTRL implementation, and only make PINCTRL_ADI2 selectable for the Blackfin families that actually have it. This way it is still possible to add e.g. I2C-based pin control expanders on the Blackfin. Cc: Steven Miao Cc: Huanhuan Feng Signed-off-by: Linus Walleij Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/blackfin/Kconfig | 7 +++++-- arch/blackfin/Kconfig.debug | 1 + drivers/pinctrl/Kconfig | 3 ++- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/blackfin/Kconfig b/arch/blackfin/Kconfig index af5369422032..d9c2866ba618 100644 --- a/arch/blackfin/Kconfig +++ b/arch/blackfin/Kconfig @@ -321,11 +321,14 @@ config BF53x config GPIO_ADI def_bool y + depends on !PINCTRL depends on (BF51x || BF52x || BF53x || BF538 || BF539 || BF561) -config PINCTRL +config PINCTRL_BLACKFIN_ADI2 def_bool y - depends on BF54x || BF60x + depends on (BF54x || BF60x) + select PINCTRL + select PINCTRL_ADI2 config MEM_MT48LC64M4A2FB_7E bool diff --git a/arch/blackfin/Kconfig.debug b/arch/blackfin/Kconfig.debug index 4ddd1b73ee3e..c8d957274cc2 100644 --- a/arch/blackfin/Kconfig.debug +++ b/arch/blackfin/Kconfig.debug @@ -18,6 +18,7 @@ config DEBUG_VERBOSE config DEBUG_MMRS tristate "Generate Blackfin MMR tree" + depends on !PINCTRL select DEBUG_FS help Create a tree of Blackfin MMRs via the debugfs tree. If diff --git a/drivers/pinctrl/Kconfig b/drivers/pinctrl/Kconfig index 82cd8b08d71f..a73c794bed03 100644 --- a/drivers/pinctrl/Kconfig +++ b/drivers/pinctrl/Kconfig @@ -33,7 +33,8 @@ config DEBUG_PINCTRL config PINCTRL_ADI2 bool "ADI pin controller driver" - depends on BLACKFIN + depends on (BF54x || BF60x) + depends on !GPIO_ADI select PINMUX select IRQ_DOMAIN help From 8151940acc356ba61e0992f464fca38118860c31 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 17 Oct 2017 16:18:36 +1100 Subject: [PATCH 0522/1013] raid5: Set R5_Expanded on parity devices as well as data. [ Upstream commit 235b6003fb28f0dd8e7ed8fbdb088bb548291766 ] When reshaping a fully degraded raid5/raid6 to a larger nubmer of devices, the new device(s) are not in-sync and so that can make the newly grown stripe appear to be "failed". To avoid this, we set the R5_Expanded flag to say "Even though this device is not fully in-sync, this block is safe so don't treat the device as failed for this stripe". This flag is set for data devices, not not for parity devices. Consequently, if you have a RAID6 with two devices that are partly recovered and a spare, and start a reshape to include the spare, then when the reshape gets past the point where the recovery was up to, it will think the stripes are failed and will get into an infinite loop, failing to make progress. So when contructing parity on an EXPAND_READY stripe, set R5_Expanded. Reported-by: Curt Signed-off-by: NeilBrown Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid5.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 7aed69a4f655..c406f16f5295 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1818,8 +1818,11 @@ static void ops_complete_reconstruct(void *stripe_head_ref) struct r5dev *dev = &sh->dev[i]; if (dev->written || i == pd_idx || i == qd_idx) { - if (!discard && !test_bit(R5_SkipCopy, &dev->flags)) + if (!discard && !test_bit(R5_SkipCopy, &dev->flags)) { set_bit(R5_UPTODATE, &dev->flags); + if (test_bit(STRIPE_EXPAND_READY, &sh->state)) + set_bit(R5_Expanded, &dev->flags); + } if (fua) set_bit(R5_WantFUA, &dev->flags); if (sync) From 84607a2958762c94741fc96e947e6a134cd5f5c3 Mon Sep 17 00:00:00 2001 From: Kurt Garloff Date: Tue, 17 Oct 2017 09:10:45 +0200 Subject: [PATCH 0523/1013] scsi: scsi_devinfo: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry [ Upstream commit 909cf3e16a5274fe2127cf3cea5c8dba77b2c412 ] All EMC SYMMETRIX support REPORT_LUNS, even if configured to report SCSI-2 for whatever reason. Signed-off-by: Kurt Garloff Signed-off-by: Hannes Reinecke Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/scsi_devinfo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/scsi_devinfo.c b/drivers/scsi/scsi_devinfo.c index 6bf43d94cdc0..b19b00adacb2 100644 --- a/drivers/scsi/scsi_devinfo.c +++ b/drivers/scsi/scsi_devinfo.c @@ -161,7 +161,7 @@ static struct { {"DGC", "RAID", NULL, BLIST_SPARSELUN}, /* Dell PV 650F, storage on LUN 0 */ {"DGC", "DISK", NULL, BLIST_SPARSELUN}, /* Dell PV 650F, no storage on LUN 0 */ {"EMC", "Invista", "*", BLIST_SPARSELUN | BLIST_LARGELUN}, - {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | BLIST_FORCELUN}, + {"EMC", "SYMMETRIX", NULL, BLIST_SPARSELUN | BLIST_LARGELUN | BLIST_REPORTLUN2}, {"EMULEX", "MD21/S2 ESDI", NULL, BLIST_SINGLELUN}, {"easyRAID", "16P", NULL, BLIST_NOREPORTLUN}, {"easyRAID", "X6P", NULL, BLIST_NOREPORTLUN}, From 9cb27c88b080274e5f94ae7a7c4da7275925ab33 Mon Sep 17 00:00:00 2001 From: Parav Pandit Date: Mon, 16 Oct 2017 08:45:15 +0300 Subject: [PATCH 0524/1013] IB/core: Fix use workqueue without WQ_MEM_RECLAIM [ Upstream commit 39baf10310e6669564a485b55267fae70a4e44ae ] The IB/core provides address resolution service and invokes callback handler when address resolve request completes of requester in worker thread context. Such caller might allocate or free memory in callback handler depending on the completion status to make further progress or to terminate a connection. Most ULPs resolve route which involves allocating route entry and path record elements in callback event handler. It has been noticed that WQ_MEM_RECLAIM flag should not be used for workers that tend to allocate memory in this [1] thread discussion. In order to mitigate this situation, WQ_MEM_RECLAIM flag was dropped for other such WQs in this [2] patch. Similar problem might arise with address resolution path, though its not yet noticed. The ib_addr workqueue is not memory reclaim path due to its nature of invoking callback that might allocate memory or don't free any memory under memory pressure. [1] https://www.spinics.net/lists/linux-rdma/msg53239.html [2] https://www.spinics.net/lists/linux-rdma/msg53416.html Fixes: f54816261c2b ("IB/addr: Remove deprecated create_singlethread_workqueue") Fixes: 5fff41e1f89d ("IB/core: Fix race condition in resolving IP to MAC") Signed-off-by: Parav Pandit Reviewed-by: Daniel Jurgens Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/addr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c index 12523f630b61..d2f74721b3ba 100644 --- a/drivers/infiniband/core/addr.c +++ b/drivers/infiniband/core/addr.c @@ -852,7 +852,7 @@ static struct notifier_block nb = { int addr_init(void) { - addr_wq = alloc_ordered_workqueue("ib_addr", WQ_MEM_RECLAIM); + addr_wq = alloc_ordered_workqueue("ib_addr", 0); if (!addr_wq) return -ENOMEM; From b57259ca055f7f2c7237dcc0e5aac446b2fa6da9 Mon Sep 17 00:00:00 2001 From: Parav Pandit Date: Mon, 16 Oct 2017 08:45:16 +0300 Subject: [PATCH 0525/1013] IB/core: Fix calculation of maximum RoCE MTU [ Upstream commit 99260132fde7bddc6e0132ce53da94d1c9ccabcb ] The original code only took into consideration the largest header possible after the IB_BTH_BYTES. This was incorrect, as the largest possible header size is the largest possible combination of headers we might run into. The new code accounts for all possible headers in the largest possible combination and subtracts that from the MTU to make sure that all packets will fit on the wire. Link: https://www.spinics.net/lists/linux-rdma/msg54558.html Fixes: 3c86aa70bf67 ("RDMA/cm: Add RDMA CM support for IBoE devices") Signed-off-by: Parav Pandit Reviewed-by: Daniel Jurgens Reported-by: Roland Dreier Signed-off-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/rdma/ib_addr.h | 7 ++++--- include/rdma/ib_pack.h | 19 +++++++++++-------- 2 files changed, 15 insertions(+), 11 deletions(-) diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index ec5008cf5d51..8815989301ab 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -245,10 +245,11 @@ static inline void rdma_addr_set_dgid(struct rdma_dev_addr *dev_addr, union ib_g static inline enum ib_mtu iboe_get_mtu(int mtu) { /* - * reduce IB headers from effective IBoE MTU. 28 stands for - * atomic header which is the biggest possible header after BTH + * Reduce IB headers from effective IBoE MTU. */ - mtu = mtu - IB_GRH_BYTES - IB_BTH_BYTES - 28; + mtu = mtu - (IB_GRH_BYTES + IB_UDP_BYTES + IB_BTH_BYTES + + IB_EXT_XRC_BYTES + IB_EXT_ATOMICETH_BYTES + + IB_ICRC_BYTES); if (mtu >= ib_mtu_enum_to_int(IB_MTU_4096)) return IB_MTU_4096; diff --git a/include/rdma/ib_pack.h b/include/rdma/ib_pack.h index 36655899ee02..7ea1382ad0e5 100644 --- a/include/rdma/ib_pack.h +++ b/include/rdma/ib_pack.h @@ -37,14 +37,17 @@ #include enum { - IB_LRH_BYTES = 8, - IB_ETH_BYTES = 14, - IB_VLAN_BYTES = 4, - IB_GRH_BYTES = 40, - IB_IP4_BYTES = 20, - IB_UDP_BYTES = 8, - IB_BTH_BYTES = 12, - IB_DETH_BYTES = 8 + IB_LRH_BYTES = 8, + IB_ETH_BYTES = 14, + IB_VLAN_BYTES = 4, + IB_GRH_BYTES = 40, + IB_IP4_BYTES = 20, + IB_UDP_BYTES = 8, + IB_BTH_BYTES = 12, + IB_DETH_BYTES = 8, + IB_EXT_ATOMICETH_BYTES = 28, + IB_EXT_XRC_BYTES = 4, + IB_ICRC_BYTES = 4 }; struct ib_field { From 4e2836b43102633dc9bffbc170be43e1d147914c Mon Sep 17 00:00:00 2001 From: Jia-Ju Bai Date: Mon, 9 Oct 2017 16:45:55 +0800 Subject: [PATCH 0526/1013] vt6655: Fix a possible sleep-in-atomic bug in vt6655_suspend [ Upstream commit 42c8eb3f6e15367981b274cb79ee4657e2c6949d ] The driver may sleep under a spinlock, and the function call path is: vt6655_suspend (acquire the spinlock) pci_set_power_state __pci_start_power_transition (drivers/pci/pci.c) msleep --> may sleep To fix it, pci_set_power_state is called without having a spinlock. This bug is found by my static analysis tool and my code review. Signed-off-by: Jia-Ju Bai Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/vt6655/device_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/staging/vt6655/device_main.c b/drivers/staging/vt6655/device_main.c index 9fcf2e223f71..1123b4f1e1d6 100644 --- a/drivers/staging/vt6655/device_main.c +++ b/drivers/staging/vt6655/device_main.c @@ -1693,10 +1693,11 @@ static int vt6655_suspend(struct pci_dev *pcid, pm_message_t state) MACbShutdown(priv); pci_disable_device(pcid); - pci_set_power_state(pcid, pci_choose_state(pcid, state)); spin_unlock_irqrestore(&priv->lock, flags); + pci_set_power_state(pcid, pci_choose_state(pcid, state)); + return 0; } From f18b2039d904f344dae779451592814e2c646686 Mon Sep 17 00:00:00 2001 From: Don Hiatt Date: Mon, 9 Oct 2017 12:38:12 -0700 Subject: [PATCH 0527/1013] IB/hfi1: Mask out A bit from psn trace [ Upstream commit d0a2f454713a42447ee4007582c0e43c47bcf230 ] The trace logic prior to the fixes below used to mask the A bit from the psn. It now mistakenly displays the A bit, which is already displayed separately. Fix by adding the appropriate mask to the psn tracing. Fixes: 228d2af1b723 ("IB/hfi1: Separate input/output header tracing") Fixes: 863cf89d472f ("IB/hfi1: Add 16B trace support") Reviewed-by: Mike Marciniszyn Signed-off-by: Don Hiatt Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/trace.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/trace.c b/drivers/infiniband/hw/hfi1/trace.c index 9938bb983ce6..9749ec9dd9f2 100644 --- a/drivers/infiniband/hw/hfi1/trace.c +++ b/drivers/infiniband/hw/hfi1/trace.c @@ -154,7 +154,7 @@ void hfi1_trace_parse_9b_bth(struct ib_other_headers *ohdr, *opcode = ib_bth_get_opcode(ohdr); *tver = ib_bth_get_tver(ohdr); *pkey = ib_bth_get_pkey(ohdr); - *psn = ib_bth_get_psn(ohdr); + *psn = mask_psn(ib_bth_get_psn(ohdr)); *qpn = ib_bth_get_qpn(ohdr); } @@ -169,7 +169,7 @@ void hfi1_trace_parse_16b_bth(struct ib_other_headers *ohdr, *pad = ib_bth_get_pad(ohdr); *se = ib_bth_get_se(ohdr); *tver = ib_bth_get_tver(ohdr); - *psn = ib_bth_get_psn(ohdr); + *psn = mask_psn(ib_bth_get_psn(ohdr)); *qpn = ib_bth_get_qpn(ohdr); } From f898f36664a5d64a55c1257e55dfacaae6a86bb7 Mon Sep 17 00:00:00 2001 From: Jia-Ju Bai Date: Sun, 8 Oct 2017 19:54:45 +0800 Subject: [PATCH 0528/1013] rtl8188eu: Fix a possible sleep-in-atomic bug in rtw_createbss_cmd [ Upstream commit 2bf9806d4228f7a6195f8e03eda0479d2a93b411 ] The driver may sleep under a spinlock, and the function call path is: rtw_surveydone_event_callback(acquire the spinlock) rtw_createbss_cmd kzalloc(GFP_KERNEL) --> may sleep To fix it, GFP_KERNEL is replaced with GFP_ATOMIC. This bug is found by my static analysis tool and my code review. Signed-off-by: Jia-Ju Bai Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_cmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/rtl8188eu/core/rtw_cmd.c b/drivers/staging/rtl8188eu/core/rtw_cmd.c index 9461bce883ea..430b8eb58976 100644 --- a/drivers/staging/rtl8188eu/core/rtw_cmd.c +++ b/drivers/staging/rtl8188eu/core/rtw_cmd.c @@ -333,7 +333,7 @@ u8 rtw_createbss_cmd(struct adapter *padapter) else RT_TRACE(_module_rtl871x_cmd_c_, _drv_info_, (" createbss for SSid:%s\n", pmlmepriv->assoc_ssid.Ssid)); - pcmd = kzalloc(sizeof(struct cmd_obj), GFP_KERNEL); + pcmd = kzalloc(sizeof(struct cmd_obj), GFP_ATOMIC); if (!pcmd) { res = _FAIL; goto exit; From f2d81d0f030afc7a873bb3454eaf59f384543baf Mon Sep 17 00:00:00 2001 From: Jia-Ju Bai Date: Sun, 8 Oct 2017 19:54:07 +0800 Subject: [PATCH 0529/1013] rtl8188eu: Fix a possible sleep-in-atomic bug in rtw_disassoc_cmd [ Upstream commit 08880f8e08cbd814e870e9d3ab9530abc1bce226 ] The driver may sleep under a spinlock, and the function call path is: rtw_set_802_11_bssid(acquire the spinlock) rtw_disassoc_cmd kzalloc(GFP_KERNEL) --> may sleep To fix it, GFP_KERNEL is replaced with GFP_ATOMIC. This bug is found by my static analysis tool and my code review. Signed-off-by: Jia-Ju Bai Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/rtl8188eu/core/rtw_cmd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/rtl8188eu/core/rtw_cmd.c b/drivers/staging/rtl8188eu/core/rtw_cmd.c index 430b8eb58976..be8542676adf 100644 --- a/drivers/staging/rtl8188eu/core/rtw_cmd.c +++ b/drivers/staging/rtl8188eu/core/rtw_cmd.c @@ -508,7 +508,7 @@ u8 rtw_disassoc_cmd(struct adapter *padapter, u32 deauth_timeout_ms, bool enqueu if (enqueue) { /* need enqueue, prepare cmd_obj and enqueue */ - cmdobj = kzalloc(sizeof(*cmdobj), GFP_KERNEL); + cmdobj = kzalloc(sizeof(*cmdobj), GFP_ATOMIC); if (!cmdobj) { res = _FAIL; kfree(param); From c97e41076a298dbc4e910c33048e553658388eed Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Tue, 17 Oct 2017 16:54:52 +0100 Subject: [PATCH 0530/1013] ipmi_si: fix memory leak on new_smi [ Upstream commit c0a32fe13cd323ca9420500b16fd69589c9ba91e ] The error exit path omits kfree'ing the allocated new_smi, causing a memory leak. Fix this by kfree'ing new_smi. Detected by CoverityScan, CID#14582571 ("Resource Leak") Fixes: 7e030d6dff71 ("ipmi: Prefer ACPI system interfaces over SMBIOS ones") Signed-off-by: Colin Ian King Signed-off-by: Corey Minyard Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/char/ipmi/ipmi_si_intf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index c04aa11f0e21..e1cbb78c6806 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3469,6 +3469,7 @@ static int add_smi(struct smi_info *new_smi) ipmi_addr_src_to_str(new_smi->addr_source), si_to_str[new_smi->si_type]); rv = -EBUSY; + kfree(new_smi); goto out_err; } } From fc8f4ca137d9525b6f7db4b1d3e8f07a087518ec Mon Sep 17 00:00:00 2001 From: Wei Yongjun Date: Tue, 17 Oct 2017 12:11:46 +0000 Subject: [PATCH 0531/1013] nullb: fix error return code in null_init() [ Upstream commit 30c516d750396c5f3ec9cb04c9e025c25e91495e ] Fix to return error code -ENOMEM from the null_alloc_dev() error handling case instead of 0, as done elsewhere in this function. Fixes: 2984c8684f96 ("nullb: factor disk parameters") Signed-off-by: Wei Yongjun Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/block/null_blk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/block/null_blk.c b/drivers/block/null_blk.c index 8042c26ea9e6..4d55af5c6e5b 100644 --- a/drivers/block/null_blk.c +++ b/drivers/block/null_blk.c @@ -1985,8 +1985,10 @@ static int __init null_init(void) for (i = 0; i < nr_devices; i++) { dev = null_alloc_dev(); - if (!dev) + if (!dev) { + ret = -ENOMEM; goto err_dev; + } ret = null_add_dev(dev); if (ret) { null_free_dev(dev); From 47b5dbdd983e0319cd5d135bbfe7e564f4bbaabc Mon Sep 17 00:00:00 2001 From: weiping zhang Date: Thu, 12 Oct 2017 14:57:06 +0800 Subject: [PATCH 0532/1013] scsi: sd: change manage_start_stop to bool in sysfs interface [ Upstream commit 623401ee33e42cee64d333877892be8db02951eb ] /sys/class/scsi_disk/0:2:0:0/manage_start_stop can be changed to 0 unexpectly by writing an invalid string. Signed-off-by: weiping zhang Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sd.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index d841743b2107..b801212eefb5 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -231,11 +231,15 @@ manage_start_stop_store(struct device *dev, struct device_attribute *attr, { struct scsi_disk *sdkp = to_scsi_disk(dev); struct scsi_device *sdp = sdkp->device; + bool v; if (!capable(CAP_SYS_ADMIN)) return -EACCES; - sdp->manage_start_stop = simple_strtoul(buf, NULL, 10); + if (kstrtobool(buf, &v)) + return -EINVAL; + + sdp->manage_start_stop = v; return count; } From dd2581c675e4311abc68826c25ec2563bccb3337 Mon Sep 17 00:00:00 2001 From: weiping zhang Date: Thu, 12 Oct 2017 14:56:44 +0800 Subject: [PATCH 0533/1013] scsi: sd: change allow_restart to bool in sysfs interface [ Upstream commit 658e9a6dc1126f21fa417cd213e1cdbff8be0ba2 ] /sys/class/scsi_disk/0:2:0:0/allow_restart can be changed to 0 unexpectedly by writing an invalid string such as the following: echo asdf > /sys/class/scsi_disk/0:2:0:0/allow_restart Signed-off-by: weiping zhang Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/sd.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c index b801212eefb5..72db0f7d221a 100644 --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -257,6 +257,7 @@ static ssize_t allow_restart_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { + bool v; struct scsi_disk *sdkp = to_scsi_disk(dev); struct scsi_device *sdp = sdkp->device; @@ -266,7 +267,10 @@ allow_restart_store(struct device *dev, struct device_attribute *attr, if (sdp->type != TYPE_DISK && sdp->type != TYPE_ZBC) return -EINVAL; - sdp->allow_restart = simple_strtoul(buf, NULL, 10); + if (kstrtobool(buf, &v)) + return -EINVAL; + + sdp->allow_restart = v; return count; } From b197f67ccfeb86d180a152796b5e3b7de1f611ef Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Wed, 4 Oct 2017 10:50:37 +0300 Subject: [PATCH 0534/1013] scsi: bfa: integer overflow in debugfs [ Upstream commit 3e351275655d3c84dc28abf170def9786db5176d ] We could allocate less memory than intended because we do: bfad->regdata = kzalloc(len << 2, GFP_KERNEL); The shift can overflow leading to a crash. This is debugfs code so the impact is very small. I fixed the network version of this in March with commit 13e2d5187f6b ("bna: integer overflow bug in debugfs"). Fixes: ab2a9ba189e8 ("[SCSI] bfa: add debugfs support") Signed-off-by: Dan Carpenter Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/bfa/bfad_debugfs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/scsi/bfa/bfad_debugfs.c b/drivers/scsi/bfa/bfad_debugfs.c index 8dcd8c70c7ee..05f523971348 100644 --- a/drivers/scsi/bfa/bfad_debugfs.c +++ b/drivers/scsi/bfa/bfad_debugfs.c @@ -255,7 +255,8 @@ bfad_debugfs_write_regrd(struct file *file, const char __user *buf, struct bfad_s *bfad = port->bfad; struct bfa_s *bfa = &bfad->bfa; struct bfa_ioc_s *ioc = &bfa->ioc; - int addr, len, rc, i; + int addr, rc, i; + u32 len; u32 *regbuf; void __iomem *rb, *reg_addr; unsigned long flags; @@ -266,7 +267,7 @@ bfad_debugfs_write_regrd(struct file *file, const char __user *buf, return PTR_ERR(kern_buf); rc = sscanf(kern_buf, "%x:%x", &addr, &len); - if (rc < 2) { + if (rc < 2 || len > (UINT_MAX >> 2)) { printk(KERN_INFO "bfad[%d]: %s failed to read user buf\n", bfad->inst_no, __func__); From 339aba679813ff1fabde362147919d3f6e536568 Mon Sep 17 00:00:00 2001 From: Artur Paszkiewicz Date: Fri, 29 Sep 2017 22:54:19 +0200 Subject: [PATCH 0535/1013] raid5-ppl: check recovery_offset when performing ppl recovery [ Upstream commit 07719ff767dcd8cc42050f185d332052f3816546 ] If starting an array that is undergoing rebuild, make ppl recovery honor the recovery_offset of a member disk and don't read data that is not yet in-sync. Signed-off-by: Artur Paszkiewicz Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid5-ppl.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/md/raid5-ppl.c b/drivers/md/raid5-ppl.c index cd026c88f7ef..702b76008886 100644 --- a/drivers/md/raid5-ppl.c +++ b/drivers/md/raid5-ppl.c @@ -758,7 +758,8 @@ static int ppl_recover_entry(struct ppl_log *log, struct ppl_header_entry *e, (unsigned long long)sector); rdev = conf->disks[dd_idx].rdev; - if (!rdev) { + if (!rdev || (!test_bit(In_sync, &rdev->flags) && + sector >= rdev->recovery_offset)) { pr_debug("%s:%*s data member disk %d missing\n", __func__, indent, "", dd_idx); update_parity = false; From 46788d19e137fc48d93f0a29eade2e02e05317f6 Mon Sep 17 00:00:00 2001 From: Guoqing Jiang Date: Fri, 29 Sep 2017 09:16:43 +0800 Subject: [PATCH 0536/1013] md-cluster: fix wrong condition check in raid1_write_request [ Upstream commit 385f4d7f946b08f36f68b0a28e95a319925b6b62 ] The check used here is to avoid conflict between write and resync, however we used the wrong logic, it should be the inverse of the checking inside "if". Fixes: 589a1c4 ("Suspend writes in RAID1 if within range") Signed-off-by: Guoqing Jiang Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/raid1.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index e4e8f9e565b7..5a8216b50e38 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1309,12 +1309,12 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio, sigset_t full, old; prepare_to_wait(&conf->wait_barrier, &w, TASK_INTERRUPTIBLE); - if (bio_end_sector(bio) <= mddev->suspend_lo || - bio->bi_iter.bi_sector >= mddev->suspend_hi || - (mddev_is_clustered(mddev) && + if ((bio_end_sector(bio) <= mddev->suspend_lo || + bio->bi_iter.bi_sector >= mddev->suspend_hi) && + (!mddev_is_clustered(mddev) || !md_cluster_ops->area_resyncing(mddev, WRITE, - bio->bi_iter.bi_sector, - bio_end_sector(bio)))) + bio->bi_iter.bi_sector, + bio_end_sector(bio)))) break; sigfillset(&full); sigprocmask(SIG_BLOCK, &full, &old); From 42364042b7dd4ba7fed36c1db0a157c7c48282ca Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 9 Oct 2017 12:03:26 -0400 Subject: [PATCH 0537/1013] xprtrdma: Don't defer fencing an async RPC's chunks [ Upstream commit 8f66b1a529047a972cb9602a919c53a95f3d7a2b ] In current kernels, waiting in xprt_release appears to be safe to do. I had erroneously believed that for ASYNC RPCs, waiting of any kind in xprt_release->xprt_rdma_free would result in deadlock. I've done injection testing and consulted with Trond to confirm that waiting in the RPC release path is safe. For the very few times where RPC resources haven't yet been released earlier by the reply handler, it is safe to wait synchronously in xprt_rdma_free for invalidation rather than defering it to MR recovery. Note: When the QP is error state, posting a LocalInvalidate should flush and mark the MR as bad. There is no way the remote HCA can access that MR via a QP in error state, so it is effectively already inaccessible and thus safe for the Upper Layer to access. The next time the MR is used it should be recognized and cleaned up properly by frwr_op_map. Signed-off-by: Chuck Lever Signed-off-by: Anna Schumaker Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/sunrpc/xprtrdma/transport.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index c84e2b644e13..8cf5ccfe180d 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -686,7 +686,7 @@ xprt_rdma_free(struct rpc_task *task) dprintk("RPC: %s: called on 0x%p\n", __func__, req->rl_reply); if (!list_empty(&req->rl_registered)) - ia->ri_ops->ro_unmap_safe(r_xprt, req, !RPC_IS_ASYNC(task)); + ia->ri_ops->ro_unmap_sync(r_xprt, &req->rl_registered); rpcrdma_unmap_sges(ia, req); rpcrdma_buffer_put(req); } From d9490e7ca55bafac97fa65a3d34244386469eb50 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 16 Oct 2017 11:38:11 +0200 Subject: [PATCH 0538/1013] udf: Avoid overflow when session starts at large offset [ Upstream commit abdc0eb06964fe1d2fea6dd1391b734d0590365d ] When session starts beyond offset 2^31 the arithmetics in udf_check_vsd() would overflow. Make sure the computation is done in large enough type. Reported-by: Cezary Sliwa Signed-off-by: Jan Kara Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- fs/udf/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/udf/super.c b/fs/udf/super.c index 99cb81d0077f..08bf097507f6 100644 --- a/fs/udf/super.c +++ b/fs/udf/super.c @@ -703,7 +703,7 @@ static loff_t udf_check_vsd(struct super_block *sb) else sectorsize = sb->s_blocksize; - sector += (sbi->s_session << sb->s_blocksize_bits); + sector += (((loff_t)sbi->s_session) << sb->s_blocksize_bits); udf_debug("Starting at sector %u (%ld byte sectors)\n", (unsigned int)(sector >> sb->s_blocksize_bits), From f7af60377ec05e2452a73c349b4fa2a4196480c8 Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Fri, 13 Oct 2017 13:40:24 -0700 Subject: [PATCH 0539/1013] macvlan: Only deliver one copy of the frame to the macvlan interface [ Upstream commit dd6b9c2c332b40f142740d1b11fb77c653ff98ea ] This patch intoduces a slight adjustment for macvlan to address the fact that in source mode I was seeing two copies of any packet addressed to the macvlan interface being delivered where there should have been only one. The issue appears to be that one copy was delivered based on the source MAC address and then the second copy was being delivered based on the destination MAC address. To fix it I am just treating a unicast address match as though it is not a match since source based macvlan isn't supposed to be matching based on the destination MAC anyway. Fixes: 79cf79abce71 ("macvlan: add source mode") Signed-off-by: Alexander Duyck Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/macvlan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/macvlan.c b/drivers/net/macvlan.c index d2aea961e0f4..fb1c9e095d0c 100644 --- a/drivers/net/macvlan.c +++ b/drivers/net/macvlan.c @@ -480,7 +480,7 @@ static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb) struct macvlan_dev, list); else vlan = macvlan_hash_lookup(port, eth->h_dest); - if (vlan == NULL) + if (!vlan || vlan->mode == MACVLAN_MODE_SOURCE) return RX_HANDLER_PASS; dev = vlan->dev; From 9fc290e529b4948b2b18fa943d8bd5c3a5d5b6e1 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 11 Oct 2017 10:48:43 -0700 Subject: [PATCH 0540/1013] IB/core: Fix endianness annotation in rdma_is_multicast_addr() [ Upstream commit 1c3aea2bc8f0b2e5b57375ead40457ff75a3a2ec ] Since ipv4_addr is a big endian 32-bit number, annotate it as such. Fixes: commit be1d325a3358 ("IB/core: Set RoCEv2 MGID according to spec") Signed-off-by: Bart Van Assche Reviewed-by: Leon Romanovsky Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- include/rdma/ib_addr.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index 8815989301ab..b2a10c762304 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -306,12 +306,12 @@ static inline void rdma_get_ll_mac(struct in6_addr *addr, u8 *mac) static inline int rdma_is_multicast_addr(struct in6_addr *addr) { - u32 ipv4_addr; + __be32 ipv4_addr; if (addr->s6_addr[0] == 0xff) return 1; - memcpy(&ipv4_addr, addr->s6_addr + 12, 4); + ipv4_addr = addr->s6_addr32[3]; return (ipv6_addr_v4mapped(addr) && ipv4_is_multicast(ipv4_addr)); } From 02ef1dd301c2f1ca0ffe58f75bf2f0f4bd291270 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 11 Oct 2017 10:48:45 -0700 Subject: [PATCH 0541/1013] RDMA/cma: Avoid triggering undefined behavior [ Upstream commit c0b64f58e8d49570aa9ee55d880f92c20ff0166b ] According to the C standard the behavior of computations with integer operands is as follows: * A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type. * The behavior for signed integer underflow and overflow is undefined. Hence only use unsigned integers when checking for integer overflow. This patch is what I came up with after having analyzed the following smatch warnings: drivers/infiniband/core/cma.c:3448: cma_resolve_ib_udp() warn: signed overflow undefined. 'offset + conn_param->private_data_len < conn_param->private_data_len' drivers/infiniband/core/cma.c:3505: cma_connect_ib() warn: signed overflow undefined. 'offset + conn_param->private_data_len < conn_param->private_data_len' Signed-off-by: Bart Van Assche Acked-by: Sean Hefty Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/cma.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 852c8fec8088..fa79c7076ccd 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -1540,7 +1540,7 @@ static struct rdma_id_private *cma_id_from_event(struct ib_cm_id *cm_id, return id_priv; } -static inline int cma_user_data_offset(struct rdma_id_private *id_priv) +static inline u8 cma_user_data_offset(struct rdma_id_private *id_priv) { return cma_family(id_priv) == AF_IB ? 0 : sizeof(struct cma_hdr); } @@ -1942,7 +1942,8 @@ static int cma_req_handler(struct ib_cm_id *cm_id, struct ib_cm_event *ib_event) struct rdma_id_private *listen_id, *conn_id = NULL; struct rdma_cm_event event; struct net_device *net_dev; - int offset, ret; + u8 offset; + int ret; listen_id = cma_id_from_event(cm_id, ib_event, &net_dev); if (IS_ERR(listen_id)) @@ -3440,7 +3441,8 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv, struct ib_cm_sidr_req_param req; struct ib_cm_id *id; void *private_data; - int offset, ret; + u8 offset; + int ret; memset(&req, 0, sizeof req); offset = cma_user_data_offset(id_priv); @@ -3497,7 +3499,8 @@ static int cma_connect_ib(struct rdma_id_private *id_priv, struct rdma_route *route; void *private_data; struct ib_cm_id *id; - int offset, ret; + u8 offset; + int ret; memset(&req, 0, sizeof req); offset = cma_user_data_offset(id_priv); From 023bff1b335860493dce3769c942b0172ea0fb5a Mon Sep 17 00:00:00 2001 From: Alex Vesker Date: Tue, 10 Oct 2017 10:36:41 +0300 Subject: [PATCH 0542/1013] IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop [ Upstream commit b4b678b06f6eef18bff44a338c01870234db0bc9 ] When ndo_open and ndo_stop are called RTNL lock should be held. In this specific case ipoib_ib_dev_open calls the offloaded ndo_open which re-sets the number of TX queue assuming RTNL lock is held. Since RTNL lock is not held, RTNL assert will fail. Signed-off-by: Alex Vesker Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c index 6cd61638b441..c97384c914a4 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c @@ -1203,10 +1203,15 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv, ipoib_ib_dev_down(dev); if (level == IPOIB_FLUSH_HEAVY) { + rtnl_lock(); if (test_bit(IPOIB_FLAG_INITIALIZED, &priv->flags)) ipoib_ib_dev_stop(dev); - if (ipoib_ib_dev_open(dev) != 0) + + result = ipoib_ib_dev_open(dev); + rtnl_unlock(); + if (result) return; + if (netif_queue_stopped(dev)) netif_start_queue(dev); } From e6a9d261e95f5ba5cda38ca9c6b7e580656032c4 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Thu, 12 Oct 2017 16:12:37 +0200 Subject: [PATCH 0543/1013] icmp: don't fail on fragment reassembly time exceeded [ Upstream commit 258bbb1b0e594ad5f5652cb526b3c63e6a7fad3d ] The ICMP implementation currently replies to an ICMP time exceeded message (type 11) with an ICMP host unreachable message (type 3, code 1). However, time exceeded messages can either represent "time to live exceeded in transit" (code 0) or "fragment reassembly time exceeded" (code 1). Unconditionally replying to "fragment reassembly time exceeded" with host unreachable messages might cause unjustified connection resets which are now easily triggered as UFO has been removed, because, in turn, sending large buffers triggers IP fragmentation. The issue can be easily reproduced by running a lot of UDP streams which is likely to trigger IP fragmentation: # start netserver in the test namespace ip netns add test ip netns exec test netserver # create a VETH pair ip link add name veth0 type veth peer name veth0 netns test ip link set veth0 up ip -n test link set veth0 up for i in $(seq 20 29); do # assign addresses to both ends ip addr add dev veth0 192.168.$i.1/24 ip -n test addr add dev veth0 192.168.$i.2/24 # start the traffic netperf -L 192.168.$i.1 -H 192.168.$i.2 -t UDP_STREAM -l 0 & done # wait send_data: data send error: No route to host (errno 113) netperf: send_omni: send_data failed: No route to host We need to differentiate instead: if fragment reassembly time exceeded is reported, we need to silently drop the packet, if time to live exceeded is reported, maintain the current behaviour. In both cases increment the related error count "icmpInTimeExcds". While at it, fix a typo in a comment, and convert the if statement into a switch to mate it more readable. Signed-off-by: Matteo Croce Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/icmp.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index 681e33998e03..3c1570d3e22f 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -782,7 +782,7 @@ static bool icmp_tag_validation(int proto) } /* - * Handle ICMP_DEST_UNREACH, ICMP_TIME_EXCEED, ICMP_QUENCH, and + * Handle ICMP_DEST_UNREACH, ICMP_TIME_EXCEEDED, ICMP_QUENCH, and * ICMP_PARAMETERPROB. */ @@ -810,7 +810,8 @@ static bool icmp_unreach(struct sk_buff *skb) if (iph->ihl < 5) /* Mangled header, drop. */ goto out_err; - if (icmph->type == ICMP_DEST_UNREACH) { + switch (icmph->type) { + case ICMP_DEST_UNREACH: switch (icmph->code & 15) { case ICMP_NET_UNREACH: case ICMP_HOST_UNREACH: @@ -846,8 +847,16 @@ static bool icmp_unreach(struct sk_buff *skb) } if (icmph->code > NR_ICMP_UNREACH) goto out; - } else if (icmph->type == ICMP_PARAMETERPROB) + break; + case ICMP_PARAMETERPROB: info = ntohl(icmph->un.gateway) >> 24; + break; + case ICMP_TIME_EXCEEDED: + __ICMP_INC_STATS(net, ICMP_MIB_INTIMEEXCDS); + if (icmph->code == ICMP_EXC_FRAGTIME) + goto out; + break; + } /* * Throw it at our lower layers From 1b07f7511a773db3a165b0a2eca35df5ba6dd1d3 Mon Sep 17 00:00:00 2001 From: Hans Holmberg Date: Fri, 13 Oct 2017 14:46:34 +0200 Subject: [PATCH 0544/1013] lightnvm: pblk: prevent gc kicks when gc is not operational MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 3e3a5b8ebd5d3b1d68facc58b0674a2564653222 ] GC can be kicked after it has been shut down when closing the last line during exit, resulting in accesses to freed structures. Make sure that GC is not triggered while it is not operational. Also make sure that GC won't be re-activated during exit when running on another processor by using timer_del_sync. Signed-off-by: Hans Holmberg Signed-off-by: Matias Bjørling Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/lightnvm/pblk-gc.c | 9 +++++---- drivers/lightnvm/pblk-init.c | 1 + 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/lightnvm/pblk-gc.c b/drivers/lightnvm/pblk-gc.c index 6090d28f7995..d6bae085e1d2 100644 --- a/drivers/lightnvm/pblk-gc.c +++ b/drivers/lightnvm/pblk-gc.c @@ -486,10 +486,10 @@ void pblk_gc_should_start(struct pblk *pblk) { struct pblk_gc *gc = &pblk->gc; - if (gc->gc_enabled && !gc->gc_active) + if (gc->gc_enabled && !gc->gc_active) { pblk_gc_start(pblk); - - pblk_gc_kick(pblk); + pblk_gc_kick(pblk); + } } /* @@ -628,7 +628,8 @@ void pblk_gc_exit(struct pblk *pblk) flush_workqueue(gc->gc_reader_wq); flush_workqueue(gc->gc_line_reader_wq); - del_timer(&gc->gc_timer); + gc->gc_enabled = 0; + del_timer_sync(&gc->gc_timer); pblk_gc_stop(pblk, 1); if (gc->gc_ts) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 1b0f61233c21..1a01f201e9e7 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -923,6 +923,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, pblk->dev = dev; pblk->disk = tdisk; pblk->state = PBLK_STATE_RUNNING; + pblk->gc.gc_enabled = 0; spin_lock_init(&pblk->trans_lock); spin_lock_init(&pblk->lock); From e22f692fbabafe322db0c3821e02a5d6da350bef Mon Sep 17 00:00:00 2001 From: Rakesh Pandit Date: Fri, 13 Oct 2017 14:46:28 +0200 Subject: [PATCH 0545/1013] lightnvm: pblk: fix changing GC group list for a line MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 27b978725d895e704aab44b99242a0514485d798 ] pblk_line_gc_list seems to had a bug since the introduction of pblk in getting GC list for a line. In b20ba1bc7 while redesigning the GC algorithm, the naming for the GC thresholds was altered, but the values for high_thrs and mid_thrs were not. The result is that when moving to the GC lists, the mid threshold is never evaluated. Fixes: a4bd217b4("lightnvm: physical block device (pblk) target") Signed-off-by: Rakesh Pandit Signed-off-by: Matias Bjørling Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/lightnvm/pblk-init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 1a01f201e9e7..8ccfb95a72ff 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -681,8 +681,8 @@ static int pblk_lines_init(struct pblk *pblk) lm->blk_bitmap_len = BITS_TO_LONGS(geo->nr_luns) * sizeof(long); lm->sec_bitmap_len = BITS_TO_LONGS(lm->sec_per_line) * sizeof(long); lm->lun_bitmap_len = BITS_TO_LONGS(geo->nr_luns) * sizeof(long); - lm->high_thrs = lm->sec_per_line / 2; - lm->mid_thrs = lm->sec_per_line / 4; + lm->mid_thrs = lm->sec_per_line / 2; + lm->high_thrs = lm->sec_per_line / 4; lm->meta_distance = (geo->nr_luns / 2) * pblk->min_write_pgs; /* Calculate necessary pages for smeta. See comment over struct From 8594a9a79c390fc99c8d8af21b3208396c94b231 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Javier=20Gonz=C3=A1lez?= Date: Fri, 13 Oct 2017 14:46:02 +0200 Subject: [PATCH 0546/1013] lightnvm: pblk: use right flag for GC allocation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 7d327a9ed6c4dca341ebf99012e0a6b80a3050e6 ] The data buffer for the GC path allocates virtual memory through vmalloc. When this change was introduced, a flag signaling kmalloc'ed memory was wrongly introduced. Use the right flag when creating a bio from this buffer. Fixes: de54e703a422 ("lightnvm: pblk: use vmalloc for GC data buffer") Signed-off-by: Javier González Signed-off-by: Matias Bjørling Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/lightnvm/pblk-read.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index d682e89e6493..ee8efb55b330 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -499,7 +499,7 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data, data_len = (*secs_to_gc) * geo->sec_size; bio = pblk_bio_map_addr(pblk, data, *secs_to_gc, data_len, - PBLK_KMALLOC_META, GFP_KERNEL); + PBLK_VMALLOC_META, GFP_KERNEL); if (IS_ERR(bio)) { pr_err("pblk: could not allocate GC bio (%lu)\n", PTR_ERR(bio)); goto err_free_dma; @@ -519,7 +519,7 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data, if (ret) { bio_endio(bio); pr_err("pblk: GC read request failed\n"); - goto err_free_dma; + goto err_free_bio; } if (!wait_for_completion_io_timeout(&wait, @@ -541,10 +541,13 @@ int pblk_submit_read_gc(struct pblk *pblk, u64 *lba_list, void *data, atomic_long_sub(*secs_to_gc, &pblk->inflight_reads); #endif + bio_put(bio); out: nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list); return NVM_IO_OK; +err_free_bio: + bio_put(bio); err_free_dma: nvm_dev_dma_free(dev->parent, rqd.meta_list, rqd.dma_meta_list); return NVM_IO_ERR; From 83ef2175ba013ab5ec9b1ec50944ba60a6af888d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Javier=20Gonz=C3=A1lez?= Date: Fri, 13 Oct 2017 14:46:01 +0200 Subject: [PATCH 0547/1013] lightnvm: pblk: initialize debug stat counter MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit a1121176ff757e3c073490a69608ea0b18a00ec1 ] Initialize the stat counter for garbage collected reads. Fixes: a4bd217b43268 ("lightnvm: physical block device (pblk) target") Signed-off-by: Javier González Signed-off-by: Matias Bjørling Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/lightnvm/pblk-init.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 8ccfb95a72ff..39de6121f46e 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -945,6 +945,7 @@ static void *pblk_init(struct nvm_tgt_dev *dev, struct gendisk *tdisk, atomic_long_set(&pblk->recov_writes, 0); atomic_long_set(&pblk->recov_writes, 0); atomic_long_set(&pblk->recov_gc_writes, 0); + atomic_long_set(&pblk->recov_gc_reads, 0); #endif atomic_long_set(&pblk->read_failed, 0); From 136e415e7a6a7d3d8666560b2147145c7636a455 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Javier=20Gonz=C3=A1lez?= Date: Fri, 13 Oct 2017 14:46:06 +0200 Subject: [PATCH 0548/1013] lightnvm: pblk: fix min size for page mempool MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit bd432417681a224d9fa4a9d43be7d4edc82135b2 ] pblk uses an internal page mempool for allocating pages on internal bios. The main two users of this memory pool are partial reads (reads with some sectors in cache and some on media) and padded writes, which need to add dummy pages to an existing bio already containing valid data (and with a large enough bioset allocated). In both cases, the maximum number of pages per bio is defined by the maximum number of physical sectors supported by the underlying device. This patch fixes a bad mempool allocation, where the min_nr of elements on the pool was fixed (to 16), which is lower than the maximum number of sectors supported by NVMe (as of the time for this patch). Instead, use the maximum number of allowed sectors reported by the device. Reported-by: Jens Axboe Signed-off-by: Javier González Signed-off-by: Matias Bjørling Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/lightnvm/pblk-core.c | 6 +++--- drivers/lightnvm/pblk-init.c | 15 ++++++++------- drivers/lightnvm/pblk-read.c | 2 +- drivers/lightnvm/pblk.h | 2 +- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index 81501644fb15..bb3e9d5d4276 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -193,7 +193,7 @@ void pblk_bio_free_pages(struct pblk *pblk, struct bio *bio, int off, bio_advance(bio, off * PBLK_EXPOSED_PAGE_SIZE); for (i = off; i < nr_pages + off; i++) { bv = bio->bi_io_vec[i]; - mempool_free(bv.bv_page, pblk->page_pool); + mempool_free(bv.bv_page, pblk->page_bio_pool); } } @@ -205,14 +205,14 @@ int pblk_bio_add_pages(struct pblk *pblk, struct bio *bio, gfp_t flags, int i, ret; for (i = 0; i < nr_pages; i++) { - page = mempool_alloc(pblk->page_pool, flags); + page = mempool_alloc(pblk->page_bio_pool, flags); if (!page) goto err; ret = bio_add_pc_page(q, bio, page, PBLK_EXPOSED_PAGE_SIZE, 0); if (ret != PBLK_EXPOSED_PAGE_SIZE) { pr_err("pblk: could not add page to bio\n"); - mempool_free(page, pblk->page_pool); + mempool_free(page, pblk->page_bio_pool); goto err; } } diff --git a/drivers/lightnvm/pblk-init.c b/drivers/lightnvm/pblk-init.c index 39de6121f46e..1b75675ee67b 100644 --- a/drivers/lightnvm/pblk-init.c +++ b/drivers/lightnvm/pblk-init.c @@ -132,7 +132,6 @@ static int pblk_rwb_init(struct pblk *pblk) } /* Minimum pages needed within a lun */ -#define PAGE_POOL_SIZE 16 #define ADDR_POOL_SIZE 64 static int pblk_set_ppaf(struct pblk *pblk) @@ -247,14 +246,16 @@ static int pblk_core_init(struct pblk *pblk) if (pblk_init_global_caches(pblk)) return -ENOMEM; - pblk->page_pool = mempool_create_page_pool(PAGE_POOL_SIZE, 0); - if (!pblk->page_pool) + /* internal bios can be at most the sectors signaled by the device. */ + pblk->page_bio_pool = mempool_create_page_pool(nvm_max_phys_sects(dev), + 0); + if (!pblk->page_bio_pool) return -ENOMEM; pblk->line_ws_pool = mempool_create_slab_pool(PBLK_WS_POOL_SIZE, pblk_blk_ws_cache); if (!pblk->line_ws_pool) - goto free_page_pool; + goto free_page_bio_pool; pblk->rec_pool = mempool_create_slab_pool(geo->nr_luns, pblk_rec_cache); if (!pblk->rec_pool) @@ -309,8 +310,8 @@ static int pblk_core_init(struct pblk *pblk) mempool_destroy(pblk->rec_pool); free_blk_ws_pool: mempool_destroy(pblk->line_ws_pool); -free_page_pool: - mempool_destroy(pblk->page_pool); +free_page_bio_pool: + mempool_destroy(pblk->page_bio_pool); return -ENOMEM; } @@ -322,7 +323,7 @@ static void pblk_core_free(struct pblk *pblk) if (pblk->bb_wq) destroy_workqueue(pblk->bb_wq); - mempool_destroy(pblk->page_pool); + mempool_destroy(pblk->page_bio_pool); mempool_destroy(pblk->line_ws_pool); mempool_destroy(pblk->rec_pool); mempool_destroy(pblk->g_rq_pool); diff --git a/drivers/lightnvm/pblk-read.c b/drivers/lightnvm/pblk-read.c index ee8efb55b330..402c732f0970 100644 --- a/drivers/lightnvm/pblk-read.c +++ b/drivers/lightnvm/pblk-read.c @@ -238,7 +238,7 @@ static int pblk_fill_partial_read_bio(struct pblk *pblk, struct nvm_rq *rqd, kunmap_atomic(src_p); kunmap_atomic(dst_p); - mempool_free(src_bv.bv_page, pblk->page_pool); + mempool_free(src_bv.bv_page, pblk->page_bio_pool); hole = find_next_zero_bit(read_bitmap, nr_secs, hole + 1); } while (hole < nr_secs); diff --git a/drivers/lightnvm/pblk.h b/drivers/lightnvm/pblk.h index 67e623bd5c2d..053164deb072 100644 --- a/drivers/lightnvm/pblk.h +++ b/drivers/lightnvm/pblk.h @@ -618,7 +618,7 @@ struct pblk { struct list_head compl_list; - mempool_t *page_pool; + mempool_t *page_bio_pool; mempool_t *line_ws_pool; mempool_t *rec_pool; mempool_t *g_rq_pool; From 99ab42f783da58540c80c91a2af8e8c764c46ca8 Mon Sep 17 00:00:00 2001 From: Rakesh Pandit Date: Fri, 13 Oct 2017 14:45:56 +0200 Subject: [PATCH 0549/1013] lightnvm: pblk: protect line bitmap while submitting meta io MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit e57903fd972a398b7140d0bc055714e13a0e58c5 ] It seems pblk_dealloc_page would race against pblk_alloc_pages for line bitmap for sector allocation.The chances are very low but might as well protect the bitmap properly. Signed-off-by: Rakesh Pandit Reviewed-by: Javier González Signed-off-by: Matias Bjørling Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/lightnvm/pblk-core.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/lightnvm/pblk-core.c b/drivers/lightnvm/pblk-core.c index bb3e9d5d4276..3f0ddc0d7393 100644 --- a/drivers/lightnvm/pblk-core.c +++ b/drivers/lightnvm/pblk-core.c @@ -486,12 +486,14 @@ void pblk_dealloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs) u64 addr; int i; + spin_lock(&line->lock); addr = find_next_zero_bit(line->map_bitmap, pblk->lm.sec_per_line, line->cur_sec); line->cur_sec = addr - nr_secs; for (i = 0; i < nr_secs; i++, line->cur_sec--) WARN_ON(!test_and_clear_bit(line->cur_sec, line->map_bitmap)); + spin_unlock(&line->lock); } u64 __pblk_alloc_page(struct pblk *pblk, struct pblk_line *line, int nr_secs) From e37eb54a000c767e7f39b806fb048c4be75ed786 Mon Sep 17 00:00:00 2001 From: Miaoqing Pan Date: Wed, 27 Sep 2017 09:13:34 +0800 Subject: [PATCH 0550/1013] ath9k: fix tx99 potential info leak MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit ee0a47186e2fa9aa1c56cadcea470ca0ba8c8692 ] When the user sets count to zero the string buffer would remain completely uninitialized which causes the kernel to parse its own stack data, potentially leading to an info leak. In addition to that, the string might be not terminated properly when the user data does not contain a 0-terminator. Signed-off-by: Miaoqing Pan Reviewed-by: Christoph Böhmwalder Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/ath9k/tx99.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/wireless/ath/ath9k/tx99.c b/drivers/net/wireless/ath/ath9k/tx99.c index 49ed1afb913c..fe3a8263b224 100644 --- a/drivers/net/wireless/ath/ath9k/tx99.c +++ b/drivers/net/wireless/ath/ath9k/tx99.c @@ -179,6 +179,9 @@ static ssize_t write_file_tx99(struct file *file, const char __user *user_buf, ssize_t len; int r; + if (count < 1) + return -EINVAL; + if (sc->cur_chan->nvifs > 1) return -EOPNOTSUPP; @@ -186,6 +189,8 @@ static ssize_t write_file_tx99(struct file *file, const char __user *user_buf, if (copy_from_user(buf, user_buf, len)) return -EFAULT; + buf[len] = '\0'; + if (strtobool(buf, &start)) return -EINVAL; From 5c212b59ded7fc9382e1c55c100f82709cf542ca Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Wed, 4 Oct 2017 12:22:55 +0300 Subject: [PATCH 0551/1013] ath10k: fix core PCI suspend when WoWLAN is supported but disabled [ Upstream commit 96378bd2c6cda5f04d0f6da2cd35d4670a982c38 ] For devices where the FW supports WoWLAN but user-space has not configured it, we don't do any PCI-specific suspend/resume operations, because mac80211 doesn't call drv_suspend() when !wowlan. This has particularly bad effects for some platforms, because we don't stop the power-save timer, and if this timer goes off after the PCI controller has suspended the link, Bad Things will happen. Commit 32faa3f0ee50 ("ath10k: add the PCI PM core suspend/resume ops") got some of this right, in that it understood there was a problem on non-WoWLAN firmware. But it forgot the $subject case. Fix this by moving all the PCI driver suspend/resume logic exclusively into the driver PM hooks. This shouldn't affect WoWLAN support much (this just gets executed later on). I would just as well kill the entirety of ath10k_hif_suspend(), as it's not even implemented on the USB or SDIO drivers. I expect that we don't need the callback, except to return "supported" (i.e., 0) or "not supported" (i.e., -EOPNOTSUPP). Fixes: 32faa3f0ee50 ("ath10k: add the PCI PM core suspend/resume ops") Fixes: 77258d409ce4 ("ath10k: enable pci soc powersaving") Signed-off-by: Brian Norris Cc: Ryan Hsu Cc: Kalle Valo Cc: Michal Kazior Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/ath10k/pci.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c index 195dafb98131..b18a9b690df4 100644 --- a/drivers/net/wireless/ath/ath10k/pci.c +++ b/drivers/net/wireless/ath/ath10k/pci.c @@ -2580,6 +2580,12 @@ void ath10k_pci_hif_power_down(struct ath10k *ar) #ifdef CONFIG_PM static int ath10k_pci_hif_suspend(struct ath10k *ar) +{ + /* Nothing to do; the important stuff is in the driver suspend. */ + return 0; +} + +static int ath10k_pci_suspend(struct ath10k *ar) { /* The grace timer can still be counting down and ar->ps_awake be true. * It is known that the device may be asleep after resuming regardless @@ -2592,6 +2598,12 @@ static int ath10k_pci_hif_suspend(struct ath10k *ar) } static int ath10k_pci_hif_resume(struct ath10k *ar) +{ + /* Nothing to do; the important stuff is in the driver resume. */ + return 0; +} + +static int ath10k_pci_resume(struct ath10k *ar) { struct ath10k_pci *ar_pci = ath10k_pci_priv(ar); struct pci_dev *pdev = ar_pci->pdev; @@ -3401,11 +3413,7 @@ static __maybe_unused int ath10k_pci_pm_suspend(struct device *dev) struct ath10k *ar = dev_get_drvdata(dev); int ret; - if (test_bit(ATH10K_FW_FEATURE_WOWLAN_SUPPORT, - ar->running_fw->fw_file.fw_features)) - return 0; - - ret = ath10k_hif_suspend(ar); + ret = ath10k_pci_suspend(ar); if (ret) ath10k_warn(ar, "failed to suspend hif: %d\n", ret); @@ -3417,11 +3425,7 @@ static __maybe_unused int ath10k_pci_pm_resume(struct device *dev) struct ath10k *ar = dev_get_drvdata(dev); int ret; - if (test_bit(ATH10K_FW_FEATURE_WOWLAN_SUPPORT, - ar->running_fw->fw_file.fw_features)) - return 0; - - ret = ath10k_hif_resume(ar); + ret = ath10k_pci_resume(ar); if (ret) ath10k_warn(ar, "failed to resume hif: %d\n", ret); From d094690cbed906ca96752cb94abed50fbf1cfaa3 Mon Sep 17 00:00:00 2001 From: Brian Norris Date: Thu, 19 Oct 2017 11:45:19 -0700 Subject: [PATCH 0552/1013] ath10k: fix build errors with !CONFIG_PM [ Upstream commit 20665a9076d48e9abd9a2db13d307f58f7ef6647 ] Build errors have been reported with CONFIG_PM=n: drivers/net/wireless/ath/ath10k/pci.c:3416:8: error: implicit declaration of function 'ath10k_pci_suspend' [-Werror=implicit-function-declaration] drivers/net/wireless/ath/ath10k/pci.c:3428:8: error: implicit declaration of function 'ath10k_pci_resume' [-Werror=implicit-function-declaration] These are caused by the combination of the following two commits: 6af1de2e4ec4 ("ath10k: mark PM functions as __maybe_unused") 96378bd2c6cd ("ath10k: fix core PCI suspend when WoWLAN is supported but disabled") Both build fine on their own. But now that ath10k_pci_pm_{suspend,resume}() is compiled unconditionally, we should also compile ath10k_pci_{suspend,resume}() unconditionally. And drop the #ifdef around ath10k_pci_hif_{suspend,resume}() too; they are trivial (empty), so we're not saving much space by compiling them out. And the alternatives would be to sprinkle more __maybe_unused, or spread the #ifdef's further. Build tested with the following combinations: CONFIG_PM=y && CONFIG_PM_SLEEP=y CONFIG_PM=y && CONFIG_PM_SLEEP=n CONFIG_PM=n Fixes: 96378bd2c6cd ("ath10k: fix core PCI suspend when WoWLAN is supported but disabled") Fixes: 096ad2a15fd8 ("Merge branch 'ath-next'") Signed-off-by: Brian Norris Signed-off-by: Kalle Valo Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/wireless/ath/ath10k/pci.c | 5 ----- 1 file changed, 5 deletions(-) diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c index b18a9b690df4..d790ea20b95d 100644 --- a/drivers/net/wireless/ath/ath10k/pci.c +++ b/drivers/net/wireless/ath/ath10k/pci.c @@ -2577,8 +2577,6 @@ void ath10k_pci_hif_power_down(struct ath10k *ar) */ } -#ifdef CONFIG_PM - static int ath10k_pci_hif_suspend(struct ath10k *ar) { /* Nothing to do; the important stuff is in the driver suspend. */ @@ -2627,7 +2625,6 @@ static int ath10k_pci_resume(struct ath10k *ar) return ret; } -#endif static bool ath10k_pci_validate_cal(void *data, size_t size) { @@ -2782,10 +2779,8 @@ static const struct ath10k_hif_ops ath10k_pci_hif_ops = { .power_down = ath10k_pci_hif_power_down, .read32 = ath10k_pci_read32, .write32 = ath10k_pci_write32, -#ifdef CONFIG_PM .suspend = ath10k_pci_hif_suspend, .resume = ath10k_pci_hif_resume, -#endif .fetch_cal_eeprom = ath10k_pci_hif_fetch_cal_eeprom, }; From bed6119aa21b28f096880342f9e2445f9ad47b25 Mon Sep 17 00:00:00 2001 From: Bin Liu Date: Tue, 5 Dec 2017 08:45:30 -0600 Subject: [PATCH 0553/1013] usb: musb: da8xx: fix babble condition handling commit bd3486ded7a0c313a6575343e6c2b21d14476645 upstream. When babble condition happens, the musb controller might automatically turns off VBUS. On DA8xx platform, the controller generates drvvbus interrupt for turning off VBUS along with the babble interrupt. In this case, we should handle the babble interrupt first and recover from the babble condition. This change ignores the drvvbus interrupt if babble interrupt is also generated at the same time, so the babble recovery routine works properly. Signed-off-by: Bin Liu Signed-off-by: Greg Kroah-Hartman --- drivers/usb/musb/da8xx.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/usb/musb/da8xx.c b/drivers/usb/musb/da8xx.c index df88123274ca..972bf4210189 100644 --- a/drivers/usb/musb/da8xx.c +++ b/drivers/usb/musb/da8xx.c @@ -305,7 +305,15 @@ static irqreturn_t da8xx_musb_interrupt(int irq, void *hci) musb->xceiv->otg->state = OTG_STATE_A_WAIT_VRISE; portstate(musb->port1_status |= USB_PORT_STAT_POWER); del_timer(&otg_workaround); - } else { + } else if (!(musb->int_usb & MUSB_INTR_BABBLE)){ + /* + * When babble condition happens, drvvbus interrupt + * is also generated. Ignore this drvvbus interrupt + * and let babble interrupt handler recovers the + * controller; otherwise, the host-mode flag is lost + * due to the MUSB_DEV_MODE() call below and babble + * recovery logic will not called. + */ musb->is_active = 0; MUSB_DEV_MODE(musb); otg->default_a = 0; From 7b3775017f4e6b87dfd2c7f63d1eaf057948f31d Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 20 Dec 2017 10:10:38 +0100 Subject: [PATCH 0554/1013] Linux 4.14.8 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 39d7af0165a8..97b5ae76ac8c 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 7 +SUBLEVEL = 8 EXTRAVERSION = NAME = Petit Gorille From c09061aec2e5245f547ea6eecf321e172e42b7fb Mon Sep 17 00:00:00 2001 From: Uros Bizjak Date: Wed, 6 Sep 2017 17:18:08 +0200 Subject: [PATCH 0555/1013] x86/asm: Remove unnecessary \n\t in front of CC_SET() from asm templates commit 3c52b5c64326d9dcfee4e10611c53ec1b1b20675 upstream. There is no need for \n\t in front of CC_SET(), as the macro already includes these two. Signed-off-by: Uros Bizjak Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170906151808.5634-1-ubizjak@gmail.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/archrandom.h | 8 ++++---- arch/x86/include/asm/bitops.h | 10 +++++----- arch/x86/include/asm/percpu.h | 2 +- arch/x86/include/asm/rmwcc.h | 2 +- 4 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h index 5b0579abb398..3ac991d81e74 100644 --- a/arch/x86/include/asm/archrandom.h +++ b/arch/x86/include/asm/archrandom.h @@ -45,7 +45,7 @@ static inline bool rdrand_long(unsigned long *v) bool ok; unsigned int retry = RDRAND_RETRY_LOOPS; do { - asm volatile(RDRAND_LONG "\n\t" + asm volatile(RDRAND_LONG CC_SET(c) : CC_OUT(c) (ok), "=a" (*v)); if (ok) @@ -59,7 +59,7 @@ static inline bool rdrand_int(unsigned int *v) bool ok; unsigned int retry = RDRAND_RETRY_LOOPS; do { - asm volatile(RDRAND_INT "\n\t" + asm volatile(RDRAND_INT CC_SET(c) : CC_OUT(c) (ok), "=a" (*v)); if (ok) @@ -71,7 +71,7 @@ static inline bool rdrand_int(unsigned int *v) static inline bool rdseed_long(unsigned long *v) { bool ok; - asm volatile(RDSEED_LONG "\n\t" + asm volatile(RDSEED_LONG CC_SET(c) : CC_OUT(c) (ok), "=a" (*v)); return ok; @@ -80,7 +80,7 @@ static inline bool rdseed_long(unsigned long *v) static inline bool rdseed_int(unsigned int *v) { bool ok; - asm volatile(RDSEED_INT "\n\t" + asm volatile(RDSEED_INT CC_SET(c) : CC_OUT(c) (ok), "=a" (*v)); return ok; diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index 2bcf47314959..3fa039855b8f 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -143,7 +143,7 @@ static __always_inline void __clear_bit(long nr, volatile unsigned long *addr) static __always_inline bool clear_bit_unlock_is_negative_byte(long nr, volatile unsigned long *addr) { bool negative; - asm volatile(LOCK_PREFIX "andb %2,%1\n\t" + asm volatile(LOCK_PREFIX "andb %2,%1" CC_SET(s) : CC_OUT(s) (negative), ADDR : "ir" ((char) ~(1 << nr)) : "memory"); @@ -246,7 +246,7 @@ static __always_inline bool __test_and_set_bit(long nr, volatile unsigned long * { bool oldbit; - asm("bts %2,%1\n\t" + asm("bts %2,%1" CC_SET(c) : CC_OUT(c) (oldbit), ADDR : "Ir" (nr)); @@ -286,7 +286,7 @@ static __always_inline bool __test_and_clear_bit(long nr, volatile unsigned long { bool oldbit; - asm volatile("btr %2,%1\n\t" + asm volatile("btr %2,%1" CC_SET(c) : CC_OUT(c) (oldbit), ADDR : "Ir" (nr)); @@ -298,7 +298,7 @@ static __always_inline bool __test_and_change_bit(long nr, volatile unsigned lon { bool oldbit; - asm volatile("btc %2,%1\n\t" + asm volatile("btc %2,%1" CC_SET(c) : CC_OUT(c) (oldbit), ADDR : "Ir" (nr) : "memory"); @@ -329,7 +329,7 @@ static __always_inline bool variable_test_bit(long nr, volatile const unsigned l { bool oldbit; - asm volatile("bt %2,%1\n\t" + asm volatile("bt %2,%1" CC_SET(c) : CC_OUT(c) (oldbit) : "m" (*(unsigned long *)addr), "Ir" (nr)); diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index 377f1ffd18be..ba3c523aaf16 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -526,7 +526,7 @@ static inline bool x86_this_cpu_variable_test_bit(int nr, { bool oldbit; - asm volatile("bt "__percpu_arg(2)",%1\n\t" + asm volatile("bt "__percpu_arg(2)",%1" CC_SET(c) : CC_OUT(c) (oldbit) : "m" (*(unsigned long __percpu *)addr), "Ir" (nr)); diff --git a/arch/x86/include/asm/rmwcc.h b/arch/x86/include/asm/rmwcc.h index d8f3a6ae9f6c..f91c365e57c3 100644 --- a/arch/x86/include/asm/rmwcc.h +++ b/arch/x86/include/asm/rmwcc.h @@ -29,7 +29,7 @@ cc_label: \ #define __GEN_RMWcc(fullop, var, cc, clobbers, ...) \ do { \ bool c; \ - asm volatile (fullop ";" CC_SET(cc) \ + asm volatile (fullop CC_SET(cc) \ : [counter] "+m" (var), CC_OUT(cc) (c) \ : __VA_ARGS__ : clobbers); \ return c; \ From 42314edefac8a80efb3d00a9f83bf1b1871bd473 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Sep 2017 21:43:30 -0500 Subject: [PATCH 0556/1013] objtool: Don't report end of section error after an empty unwind hint commit 00d96180dc38ef872ac471c2d3e14b067cbd895d upstream. If asm code specifies an UNWIND_HINT_EMPTY hint, don't warn if the section ends unexpectedly. This can happen with the xen-head.S code because the hypercall_page is "text" but it's all zeros. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Jiri Slaby Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/ddafe199dd8797e40e3c2777373347eba1d65572.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/objtool/check.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index c0e26ad1fa7e..9b341584eb1b 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1757,11 +1757,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first, if (insn->dead_end) return 0; - insn = next_insn; - if (!insn) { + if (!next_insn) { + if (state.cfa.base == CFI_UNDEFINED) + return 0; WARN("%s: unexpected end of section", sec->name); return 1; } + + insn = next_insn; } return 0; From 9cf5a88b165e203d8c7d5284640ea6aa61baf78a Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Sep 2017 21:43:31 -0500 Subject: [PATCH 0557/1013] x86/head: Remove confusing comment commit 17270717e80de33a884ad328fea5f407d87f6d6a upstream. This comment is actively wrong and confusing. It refers to the registers' stack offsets after the pt_regs has been constructed on the stack, but this code is *before* that. At this point the stack just has the standard iret frame, for which no comment should be needed. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Jiri Slaby Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/a3c267b770fc56c9b86df9c11c552848248aace2.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head_64.S | 4 ---- 1 file changed, 4 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 6dde3f3fc1f8..41d6146dbd5a 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -271,10 +271,6 @@ bad_address: __INIT ENTRY(early_idt_handler_array) - # 104(%rsp) %rflags - # 96(%rsp) %cs - # 88(%rsp) %rip - # 80(%rsp) error code i = 0 .rept NUM_EXCEPTION_VECTORS .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 From 98ce8eee6021e32ba8f606ed7de2a9c0f3807dcf Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Sep 2017 21:43:32 -0500 Subject: [PATCH 0558/1013] x86/head: Remove unused 'bad_address' code commit a8b88e84d124bc92c4808e72b8b8c0e0bb538630 upstream. It's no longer possible for this code to be executed, so remove it. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Jiri Slaby Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/32a46fe92d2083700599b36872b26e7dfd7b7965.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head_64.S | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 41d6146dbd5a..fa800dbeb707 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -266,9 +266,6 @@ ENDPROC(start_cpu0) .quad init_thread_union + THREAD_SIZE - SIZEOF_PTREGS __FINITDATA -bad_address: - jmp bad_address - __INIT ENTRY(early_idt_handler_array) i = 0 From 8233afff9b45feca9020b2fc52f7967da5a13907 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Sep 2017 21:43:33 -0500 Subject: [PATCH 0559/1013] x86/head: Fix head ELF function annotations commit 015a2ea5478680fc5216d56b7ff306f2a74efaf9 upstream. These functions aren't callable C-type functions, so don't annotate them as such. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Jiri Slaby Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/36eb182738c28514f8bf95e403d89b6413a88883.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head_64.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index fa800dbeb707..2d83ff6e5158 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -235,7 +235,7 @@ ENTRY(secondary_startup_64) pushq %rax # target address in negative space lretq .Lafter_lret: -ENDPROC(secondary_startup_64) +END(secondary_startup_64) #include "verify_cpu.S" @@ -278,7 +278,7 @@ ENTRY(early_idt_handler_array) i = i + 1 .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc .endr -ENDPROC(early_idt_handler_array) +END(early_idt_handler_array) early_idt_handler_common: /* @@ -321,7 +321,7 @@ early_idt_handler_common: 20: decl early_recursion_flag(%rip) jmp restore_regs_and_iret -ENDPROC(early_idt_handler_common) +END(early_idt_handler_common) __INITDATA From aad9d83f9dcbc896a034f2676d3d2b661b0ca5cf Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Sep 2017 21:43:34 -0500 Subject: [PATCH 0560/1013] x86/boot: Annotate verify_cpu() as a callable function commit e93db75a0054b23a874a12c63376753544f3fe9e upstream. verify_cpu() is a callable function. Annotate it as such. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Jiri Slaby Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/293024b8a080832075312f38c07ccc970fc70292.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/verify_cpu.S | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/verify_cpu.S b/arch/x86/kernel/verify_cpu.S index 014ea59aa153..3d3c2f71f617 100644 --- a/arch/x86/kernel/verify_cpu.S +++ b/arch/x86/kernel/verify_cpu.S @@ -33,7 +33,7 @@ #include #include -verify_cpu: +ENTRY(verify_cpu) pushf # Save caller passed flags push $0 # Kill any dangerous flags popf @@ -139,3 +139,4 @@ verify_cpu: popf # Restore caller passed flags xorl %eax, %eax ret +ENDPROC(verify_cpu) From 2c9863c1687b2e1398d943351ede4ad45ab77cbe Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Sep 2017 21:43:35 -0500 Subject: [PATCH 0561/1013] x86/xen: Fix xen head ELF annotations commit 2582d3df95c76d3b686453baf90b64d57e87d1e8 upstream. Mark the ends of the startup_xen and hypercall_page code sections. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Jiri Slaby Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/3a80a394d30af43d9cefa1a29628c45ed8420c97.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/xen-head.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S index b5b8d7f43557..dc1b8439617f 100644 --- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -34,7 +34,7 @@ ENTRY(startup_xen) mov $init_thread_union+THREAD_SIZE, %_ASM_SP jmp xen_start_kernel - +END(startup_xen) __FINIT #endif @@ -48,7 +48,7 @@ ENTRY(hypercall_page) .type xen_hypercall_##n, @function; .size xen_hypercall_##n, 32 #include #undef HYPERCALL - +END(hypercall_page) .popsection ELFNOTE(Xen, XEN_ELFNOTE_GUEST_OS, .asciz "linux") From d074a1075f6a123e3f9c5fdbac7eea4c1006e03b Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Sep 2017 21:43:36 -0500 Subject: [PATCH 0562/1013] x86/xen: Add unwind hint annotations commit abbe1cac6214d81d2f4e149aba64a8760703144e upstream. Add unwind hint annotations to the xen head code so the ORC unwinder can read head_64.o. hypercall_page needs empty annotations at 32-byte intervals to match the 'xen_hypercall_*' ELF functions at those locations. Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Jiri Slaby Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/70ed2eb516fe9266be766d953f93c2571bca88cc.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/xen-head.S | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/xen-head.S b/arch/x86/xen/xen-head.S index dc1b8439617f..497cc55a0c16 100644 --- a/arch/x86/xen/xen-head.S +++ b/arch/x86/xen/xen-head.S @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -20,6 +21,7 @@ #ifdef CONFIG_XEN_PV __INIT ENTRY(startup_xen) + UNWIND_HINT_EMPTY cld /* Clear .bss */ @@ -41,7 +43,10 @@ END(startup_xen) .pushsection .text .balign PAGE_SIZE ENTRY(hypercall_page) - .skip PAGE_SIZE + .rept (PAGE_SIZE / 32) + UNWIND_HINT_EMPTY + .skip 32 + .endr #define HYPERCALL(n) \ .equ xen_hypercall_##n, hypercall_page + __HYPERVISOR_##n * 32; \ From 5d5e60c80fd8613b8a76876b3ffd357f7a0bf7e7 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Sep 2017 21:43:37 -0500 Subject: [PATCH 0563/1013] x86/head: Add unwind hint annotations commit 2704fbb672d0d9a19414907fda7949283dcef6a1 upstream. Jiri Slaby reported an ORC issue when unwinding from an idle task. The stack was: ffffffff811083c2 do_idle+0x142/0x1e0 ffffffff8110861d cpu_startup_entry+0x5d/0x60 ffffffff82715f58 start_kernel+0x3ff/0x407 ffffffff827153e8 x86_64_start_kernel+0x14e/0x15d ffffffff810001bf secondary_startup_64+0x9f/0xa0 The ORC unwinder errored out at secondary_startup_64 because the head code isn't annotated yet so there wasn't a corresponding ORC entry. Fix that and any other head-related unwinding issues by adding unwind hints to the head code. Reported-by: Jiri Slaby Tested-by: Jiri Slaby Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/78ef000a2f68f545d6eef44ee912edceaad82ccf.1505764066.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/Makefile | 1 - arch/x86/kernel/head_64.S | 14 ++++++++++++-- 2 files changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index 5f70044340ff..fbb0cbd4f2aa 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -27,7 +27,6 @@ KASAN_SANITIZE_dumpstack.o := n KASAN_SANITIZE_dumpstack_$(BITS).o := n KASAN_SANITIZE_stacktrace.o := n -OBJECT_FILES_NON_STANDARD_head_$(BITS).o := y OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y OBJECT_FILES_NON_STANDARD_test_nx.o := y diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 2d83ff6e5158..6733cac7cba5 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -50,6 +50,7 @@ L3_START_KERNEL = pud_index(__START_KERNEL_map) .code64 .globl startup_64 startup_64: + UNWIND_HINT_EMPTY /* * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0, * and someone has loaded an identity mapped page table @@ -89,6 +90,7 @@ startup_64: addq $(early_top_pgt - __START_KERNEL_map), %rax jmp 1f ENTRY(secondary_startup_64) + UNWIND_HINT_EMPTY /* * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0, * and someone has loaded a mapped page table. @@ -133,6 +135,7 @@ ENTRY(secondary_startup_64) movq $1f, %rax jmp *%rax 1: + UNWIND_HINT_EMPTY /* Check if nx is implemented */ movl $0x80000001, %eax @@ -247,6 +250,7 @@ END(secondary_startup_64) */ ENTRY(start_cpu0) movq initial_stack(%rip), %rsp + UNWIND_HINT_EMPTY jmp .Ljump_to_C_code ENDPROC(start_cpu0) #endif @@ -271,13 +275,18 @@ ENTRY(early_idt_handler_array) i = 0 .rept NUM_EXCEPTION_VECTORS .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 - pushq $0 # Dummy error code, to make stack frame uniform + UNWIND_HINT_IRET_REGS + pushq $0 # Dummy error code, to make stack frame uniform + .else + UNWIND_HINT_IRET_REGS offset=8 .endif pushq $i # 72(%rsp) Vector number jmp early_idt_handler_common + UNWIND_HINT_IRET_REGS i = i + 1 .fill early_idt_handler_array + i*EARLY_IDT_HANDLER_SIZE - ., 1, 0xcc .endr + UNWIND_HINT_IRET_REGS offset=16 END(early_idt_handler_array) early_idt_handler_common: @@ -306,6 +315,7 @@ early_idt_handler_common: pushq %r13 /* pt_regs->r13 */ pushq %r14 /* pt_regs->r14 */ pushq %r15 /* pt_regs->r15 */ + UNWIND_HINT_REGS cmpq $14,%rsi /* Page fault? */ jnz 10f @@ -428,7 +438,7 @@ ENTRY(phys_base) EXPORT_SYMBOL(phys_base) #include "../../x86/xen/xen-head.S" - + __PAGE_ALIGNED_BSS NEXT_PAGE(empty_zero_page) .skip PAGE_SIZE From 2cb7165b4dcf80b03d7cc86fedd1a65a455823b3 Mon Sep 17 00:00:00 2001 From: Jan Beulich Date: Mon, 25 Sep 2017 02:06:19 -0600 Subject: [PATCH 0564/1013] ACPI / APEI: adjust a local variable type in ghes_ioremap_pfn_irq() commit 095f613c6b386a1704b73a549e9ba66c1d5381ae upstream. Match up with what 7edda0886b ("acpi: apei: handle SEA notification type for ARMv8") did for ghes_ioremap_pfn_nmi(). Signed-off-by: Jan Beulich Reviewed-by: Borislav Petkov Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/apei/ghes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 3c3a37b8503b..69ef0b6bf25d 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -174,7 +174,8 @@ static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { - unsigned long vaddr, paddr; + unsigned long vaddr; + phys_addr_t paddr; pgprot_t prot; vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); From ab7fc55ef231511ae40adf4c51cec89035dad4db Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 12 Oct 2017 09:24:30 +0200 Subject: [PATCH 0565/1013] x86/unwinder: Make CONFIG_UNWINDER_ORC=y the default in the 64-bit defconfig commit 1e4078f0bba46ad61b69548abe6a6faf63b89380 upstream. Increase testing coverage by turning on the primary x86 unwinder for the 64-bit defconfig. Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/configs/x86_64_defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig index 4a4b16e56d35..eb65c248708d 100644 --- a/arch/x86/configs/x86_64_defconfig +++ b/arch/x86/configs/x86_64_defconfig @@ -299,6 +299,7 @@ CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_RODATA_TEST is not set CONFIG_DEBUG_BOOT_PARAMS=y CONFIG_OPTIMIZE_INLINING=y +CONFIG_ORC_UNWINDER=y CONFIG_SECURITY=y CONFIG_SECURITY_NETWORK=y CONFIG_SECURITY_SELINUX=y From a11adc6a2e0b429adab534c44f68c057d12ecd4d Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Thu, 12 Oct 2017 18:06:19 -0400 Subject: [PATCH 0566/1013] x86/fpu/debug: Remove unused 'x86_fpu_state' and 'x86_fpu_deactivate_state' tracepoints commit 127a1bea40f7f2a36bc7207ea4d51bb6b4e936fa upstream. Commit: d1898b733619 ("x86/fpu: Add tracepoints to dump FPU state at key points") ... added the 'x86_fpu_state' and 'x86_fpu_deactivate_state' trace points, but never used them. Today they are still not used. As they take up and waste memory, remove them. Signed-off-by: Steven Rostedt (VMware) Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20171012180619.670b68b6@gandalf.local.home Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/trace/fpu.h | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/arch/x86/include/asm/trace/fpu.h b/arch/x86/include/asm/trace/fpu.h index fa60398bbc3a..069c04be1507 100644 --- a/arch/x86/include/asm/trace/fpu.h +++ b/arch/x86/include/asm/trace/fpu.h @@ -34,11 +34,6 @@ DECLARE_EVENT_CLASS(x86_fpu, ) ); -DEFINE_EVENT(x86_fpu, x86_fpu_state, - TP_PROTO(struct fpu *fpu), - TP_ARGS(fpu) -); - DEFINE_EVENT(x86_fpu, x86_fpu_before_save, TP_PROTO(struct fpu *fpu), TP_ARGS(fpu) @@ -74,11 +69,6 @@ DEFINE_EVENT(x86_fpu, x86_fpu_activate_state, TP_ARGS(fpu) ); -DEFINE_EVENT(x86_fpu, x86_fpu_deactivate_state, - TP_PROTO(struct fpu *fpu), - TP_ARGS(fpu) -); - DEFINE_EVENT(x86_fpu, x86_fpu_init_state, TP_PROTO(struct fpu *fpu), TP_ARGS(fpu) From 8af220c9e240a47660686161e6d04043ad09c563 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 13 Oct 2017 15:02:00 -0500 Subject: [PATCH 0567/1013] x86/unwind: Rename unwinder config options to 'CONFIG_UNWINDER_*' commit 11af847446ed0d131cf24d16a7ef3d5ea7a49554 upstream. Rename the unwinder config options from: CONFIG_ORC_UNWINDER CONFIG_FRAME_POINTER_UNWINDER CONFIG_GUESS_UNWINDER to: CONFIG_UNWINDER_ORC CONFIG_UNWINDER_FRAME_POINTER CONFIG_UNWINDER_GUESS ... in order to give them a more logical config namespace. Suggested-by: Ingo Molnar Signed-off-by: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/73972fc7e2762e91912c6b9584582703d6f1b8cc.1507924831.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/orc-unwinder.txt | 2 +- Makefile | 4 ++-- arch/x86/Kconfig | 2 +- arch/x86/Kconfig.debug | 10 +++++----- arch/x86/configs/tiny.config | 4 ++-- arch/x86/configs/x86_64_defconfig | 2 +- arch/x86/include/asm/module.h | 2 +- arch/x86/include/asm/unwind.h | 8 ++++---- arch/x86/kernel/Makefile | 6 +++--- include/asm-generic/vmlinux.lds.h | 2 +- lib/Kconfig.debug | 2 +- scripts/Makefile.build | 2 +- 12 files changed, 23 insertions(+), 23 deletions(-) diff --git a/Documentation/x86/orc-unwinder.txt b/Documentation/x86/orc-unwinder.txt index af0c9a4c65a6..cd4b29be29af 100644 --- a/Documentation/x86/orc-unwinder.txt +++ b/Documentation/x86/orc-unwinder.txt @@ -4,7 +4,7 @@ ORC unwinder Overview -------- -The kernel CONFIG_ORC_UNWINDER option enables the ORC unwinder, which is +The kernel CONFIG_UNWINDER_ORC option enables the ORC unwinder, which is similar in concept to a DWARF unwinder. The difference is that the format of the ORC data is much simpler than DWARF, which in turn allows the ORC unwinder to be much simpler and faster. diff --git a/Makefile b/Makefile index 97b5ae76ac8c..41632ea25055 100644 --- a/Makefile +++ b/Makefile @@ -935,8 +935,8 @@ ifdef CONFIG_STACK_VALIDATION ifeq ($(has_libelf),1) objtool_target := tools/objtool FORCE else - ifdef CONFIG_ORC_UNWINDER - $(error "Cannot generate ORC metadata for CONFIG_ORC_UNWINDER=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") + ifdef CONFIG_UNWINDER_ORC + $(error "Cannot generate ORC metadata for CONFIG_UNWINDER_ORC=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") else $(warning "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel") endif diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 9bceea6a5852..6aad9f070b9d 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -171,7 +171,7 @@ config X86 select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API - select HAVE_RELIABLE_STACKTRACE if X86_64 && FRAME_POINTER_UNWINDER && STACK_VALIDATION + select HAVE_RELIABLE_STACKTRACE if X86_64 && UNWINDER_FRAME_POINTER && STACK_VALIDATION select HAVE_STACK_VALIDATION if X86_64 select HAVE_SYSCALL_TRACEPOINTS select HAVE_UNSTABLE_SCHED_CLOCK diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 90b123056f4b..728b44a56978 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -359,13 +359,13 @@ config PUNIT_ATOM_DEBUG choice prompt "Choose kernel unwinder" - default FRAME_POINTER_UNWINDER + default UNWINDER_FRAME_POINTER ---help--- This determines which method will be used for unwinding kernel stack traces for panics, oopses, bugs, warnings, perf, /proc//stack, livepatch, lockdep, and more. -config FRAME_POINTER_UNWINDER +config UNWINDER_FRAME_POINTER bool "Frame pointer unwinder" select FRAME_POINTER ---help--- @@ -380,7 +380,7 @@ config FRAME_POINTER_UNWINDER consistency model, as this is currently the only way to get a reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE). -config ORC_UNWINDER +config UNWINDER_ORC bool "ORC unwinder" depends on X86_64 select STACK_VALIDATION @@ -396,7 +396,7 @@ config ORC_UNWINDER Enabling this option will increase the kernel's runtime memory usage by roughly 2-4MB, depending on your kernel config. -config GUESS_UNWINDER +config UNWINDER_GUESS bool "Guess unwinder" depends on EXPERT ---help--- @@ -411,7 +411,7 @@ config GUESS_UNWINDER endchoice config FRAME_POINTER - depends on !ORC_UNWINDER && !GUESS_UNWINDER + depends on !UNWINDER_ORC && !UNWINDER_GUESS bool endmenu diff --git a/arch/x86/configs/tiny.config b/arch/x86/configs/tiny.config index 550cd5012b73..66c9e2aab16c 100644 --- a/arch/x86/configs/tiny.config +++ b/arch/x86/configs/tiny.config @@ -1,5 +1,5 @@ CONFIG_NOHIGHMEM=y # CONFIG_HIGHMEM4G is not set # CONFIG_HIGHMEM64G is not set -CONFIG_GUESS_UNWINDER=y -# CONFIG_FRAME_POINTER_UNWINDER is not set +CONFIG_UNWINDER_GUESS=y +# CONFIG_UNWINDER_FRAME_POINTER is not set diff --git a/arch/x86/configs/x86_64_defconfig b/arch/x86/configs/x86_64_defconfig index eb65c248708d..e32fc1f274d8 100644 --- a/arch/x86/configs/x86_64_defconfig +++ b/arch/x86/configs/x86_64_defconfig @@ -299,7 +299,7 @@ CONFIG_DEBUG_STACKOVERFLOW=y # CONFIG_DEBUG_RODATA_TEST is not set CONFIG_DEBUG_BOOT_PARAMS=y CONFIG_OPTIMIZE_INLINING=y -CONFIG_ORC_UNWINDER=y +CONFIG_UNWINDER_ORC=y CONFIG_SECURITY=y CONFIG_SECURITY_NETWORK=y CONFIG_SECURITY_SELINUX=y diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h index 8546fafa21a9..7948a17febb4 100644 --- a/arch/x86/include/asm/module.h +++ b/arch/x86/include/asm/module.h @@ -6,7 +6,7 @@ #include struct mod_arch_specific { -#ifdef CONFIG_ORC_UNWINDER +#ifdef CONFIG_UNWINDER_ORC unsigned int num_orcs; int *orc_unwind_ip; struct orc_entry *orc_unwind; diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index 87adc0d38c4a..e9cc6fe1fc6f 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -13,11 +13,11 @@ struct unwind_state { struct task_struct *task; int graph_idx; bool error; -#if defined(CONFIG_ORC_UNWINDER) +#if defined(CONFIG_UNWINDER_ORC) bool signal, full_regs; unsigned long sp, bp, ip; struct pt_regs *regs; -#elif defined(CONFIG_FRAME_POINTER_UNWINDER) +#elif defined(CONFIG_UNWINDER_FRAME_POINTER) bool got_irq; unsigned long *bp, *orig_sp, ip; struct pt_regs *regs; @@ -51,7 +51,7 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, __unwind_start(state, task, regs, first_frame); } -#if defined(CONFIG_ORC_UNWINDER) || defined(CONFIG_FRAME_POINTER_UNWINDER) +#if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER) static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) { if (unwind_done(state)) @@ -66,7 +66,7 @@ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) } #endif -#ifdef CONFIG_ORC_UNWINDER +#ifdef CONFIG_UNWINDER_ORC void unwind_init(void); void unwind_module_init(struct module *mod, void *orc_ip, size_t orc_ip_size, void *orc, size_t orc_size); diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index fbb0cbd4f2aa..d12da41f72da 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -127,9 +127,9 @@ obj-$(CONFIG_PERF_EVENTS) += perf_regs.o obj-$(CONFIG_TRACING) += tracepoint.o obj-$(CONFIG_SCHED_MC_PRIO) += itmt.o -obj-$(CONFIG_ORC_UNWINDER) += unwind_orc.o -obj-$(CONFIG_FRAME_POINTER_UNWINDER) += unwind_frame.o -obj-$(CONFIG_GUESS_UNWINDER) += unwind_guess.o +obj-$(CONFIG_UNWINDER_ORC) += unwind_orc.o +obj-$(CONFIG_UNWINDER_FRAME_POINTER) += unwind_frame.o +obj-$(CONFIG_UNWINDER_GUESS) += unwind_guess.o ### # 64 bit specific files diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index e549bff87c5b..353f52fdc35e 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -688,7 +688,7 @@ #define BUG_TABLE #endif -#ifdef CONFIG_ORC_UNWINDER +#ifdef CONFIG_UNWINDER_ORC #define ORC_UNWIND_TABLE \ . = ALIGN(4); \ .orc_unwind_ip : AT(ADDR(.orc_unwind_ip) - LOAD_OFFSET) { \ diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index dfdad67d8f6c..ff21b4dbb392 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -376,7 +376,7 @@ config STACK_VALIDATION that runtime stack traces are more reliable. This is also a prerequisite for generation of ORC unwind data, which - is needed for CONFIG_ORC_UNWINDER. + is needed for CONFIG_UNWINDER_ORC. For more information, see tools/objtool/Documentation/stack-validation.txt. diff --git a/scripts/Makefile.build b/scripts/Makefile.build index bb831d49bcfd..e63af4e19382 100644 --- a/scripts/Makefile.build +++ b/scripts/Makefile.build @@ -259,7 +259,7 @@ ifneq ($(SKIP_STACK_VALIDATION),1) __objtool_obj := $(objtree)/tools/objtool/objtool -objtool_args = $(if $(CONFIG_ORC_UNWINDER),orc generate,check) +objtool_args = $(if $(CONFIG_UNWINDER_ORC),orc generate,check) ifndef CONFIG_FRAME_POINTER objtool_args += --no-fp From b40a923903d0a535676b8d7b22bfe17260c3d35a Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 13 Oct 2017 15:02:01 -0500 Subject: [PATCH 0568/1013] x86/unwind: Make CONFIG_UNWINDER_ORC=y the default in kconfig for 64-bit commit fc72ae40e30327aa24eb88a24b9c7058f938bd36 upstream. The ORC unwinder has been stable in testing so far. Give it much wider testing by making it the default in kconfig for x86_64. It's not yet supported for 32-bit, so leave frame pointers as the default there. Suggested-by: Ingo Molnar Signed-off-by: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/9b1237bbe7244ed9cdf8db2dcb1253e37e1c341e.1507924831.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/Kconfig.debug | 33 +++++++++++++++++---------------- 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug index 728b44a56978..6293a8768a91 100644 --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -359,27 +359,13 @@ config PUNIT_ATOM_DEBUG choice prompt "Choose kernel unwinder" - default UNWINDER_FRAME_POINTER + default UNWINDER_ORC if X86_64 + default UNWINDER_FRAME_POINTER if X86_32 ---help--- This determines which method will be used for unwinding kernel stack traces for panics, oopses, bugs, warnings, perf, /proc//stack, livepatch, lockdep, and more. -config UNWINDER_FRAME_POINTER - bool "Frame pointer unwinder" - select FRAME_POINTER - ---help--- - This option enables the frame pointer unwinder for unwinding kernel - stack traces. - - The unwinder itself is fast and it uses less RAM than the ORC - unwinder, but the kernel text size will grow by ~3% and the kernel's - overall performance will degrade by roughly 5-10%. - - This option is recommended if you want to use the livepatch - consistency model, as this is currently the only way to get a - reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE). - config UNWINDER_ORC bool "ORC unwinder" depends on X86_64 @@ -396,6 +382,21 @@ config UNWINDER_ORC Enabling this option will increase the kernel's runtime memory usage by roughly 2-4MB, depending on your kernel config. +config UNWINDER_FRAME_POINTER + bool "Frame pointer unwinder" + select FRAME_POINTER + ---help--- + This option enables the frame pointer unwinder for unwinding kernel + stack traces. + + The unwinder itself is fast and it uses less RAM than the ORC + unwinder, but the kernel text size will grow by ~3% and the kernel's + overall performance will degrade by roughly 5-10%. + + This option is recommended if you want to use the livepatch + consistency model, as this is currently the only way to get a + reliable stack trace (CONFIG_HAVE_RELIABLE_STACKTRACE). + config UNWINDER_GUESS bool "Guess unwinder" depends on EXPERT From eb3addb22727034dd733fcc07a3b4248fe523b72 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Fri, 13 Oct 2017 14:56:41 -0700 Subject: [PATCH 0569/1013] bitops: Add clear/set_bit32() to linux/bitops.h commit cbe96375025e14fc76f9ed42ee5225120d7210f8 upstream. Add two simple wrappers around set_bit/clear_bit() that accept the common case of an u32 array. This avoids writing casts in all callers. Signed-off-by: Andi Kleen Reviewed-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171013215645.23166-2-andi@firstfloor.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- include/linux/bitops.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/include/linux/bitops.h b/include/linux/bitops.h index d03c5dd6185d..8a7e9924df57 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -228,6 +228,32 @@ static inline unsigned long __ffs64(u64 word) return __ffs((unsigned long)word); } +/* + * clear_bit32 - Clear a bit in memory for u32 array + * @nr: Bit to clear + * @addr: u32 * address of bitmap + * + * Same as clear_bit, but avoids needing casts for u32 arrays. + */ + +static __always_inline void clear_bit32(long nr, volatile u32 *addr) +{ + clear_bit(nr, (volatile unsigned long *)addr); +} + +/* + * set_bit32 - Set a bit in memory for u32 array + * @nr: Bit to clear + * @addr: u32 * address of bitmap + * + * Same as set_bit, but avoids needing casts for u32 arrays. + */ + +static __always_inline void set_bit32(long nr, volatile u32 *addr) +{ + set_bit(nr, (volatile unsigned long *)addr); +} + #ifdef __KERNEL__ #ifndef set_mask_bits From d602a3465c77190ef75be052c76ccbdb6c72ed6e Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Fri, 13 Oct 2017 14:56:42 -0700 Subject: [PATCH 0570/1013] x86/cpuid: Add generic table for CPUID dependencies commit 0b00de857a648dafe7020878c7a27cf776f5edf4 upstream. Some CPUID features depend on other features. Currently it's possible to to clear dependent features, but not clear the base features, which can cause various interesting problems. This patch implements a generic table to describe dependencies between CPUID features, to be used by all code that clears CPUID. Some subsystems (like XSAVE) had an own implementation of this, but it's better to do it all in a single place for everyone. Then clear_cpu_cap and setup_clear_cpu_cap always look up this table and clear all dependencies too. This is intended to be a practical table: only for features that make sense to clear. If someone for example clears FPU, or other features that are essentially part of the required base feature set, not much is going to work. Handling that is right now out of scope. We're only handling features which can be usefully cleared. Signed-off-by: Andi Kleen Reviewed-by: Thomas Gleixner Cc: Jonathan McDowell Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171013215645.23166-3-andi@firstfloor.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeature.h | 9 +-- arch/x86/include/asm/cpufeatures.h | 5 ++ arch/x86/kernel/cpu/Makefile | 1 + arch/x86/kernel/cpu/cpuid-deps.c | 113 +++++++++++++++++++++++++++++ 4 files changed, 123 insertions(+), 5 deletions(-) create mode 100644 arch/x86/kernel/cpu/cpuid-deps.c diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 0dfa68438e80..bf6a76202a77 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -126,11 +126,10 @@ extern const char * const x86_bug_flags[NBUGINTS*32]; #define boot_cpu_has(bit) cpu_has(&boot_cpu_data, bit) #define set_cpu_cap(c, bit) set_bit(bit, (unsigned long *)((c)->x86_capability)) -#define clear_cpu_cap(c, bit) clear_bit(bit, (unsigned long *)((c)->x86_capability)) -#define setup_clear_cpu_cap(bit) do { \ - clear_cpu_cap(&boot_cpu_data, bit); \ - set_bit(bit, (unsigned long *)cpu_caps_cleared); \ -} while (0) + +extern void setup_clear_cpu_cap(unsigned int bit); +extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit); + #define setup_force_cpu_cap(bit) do { \ set_cpu_cap(&boot_cpu_data, bit); \ set_bit(bit, (unsigned long *)cpu_caps_set); \ diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 793690fbda36..9a0a5b128a30 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -22,6 +22,11 @@ * this feature bit is not displayed in /proc/cpuinfo at all. */ +/* + * When adding new features here that depend on other features, + * please update the table in kernel/cpu/cpuid-deps.c + */ + /* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */ #define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ #define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index c60922a66385..90cb82dbba57 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -23,6 +23,7 @@ obj-y += rdrand.o obj-y += match.o obj-y += bugs.o obj-$(CONFIG_CPU_FREQ) += aperfmperf.o +obj-y += cpuid-deps.o obj-$(CONFIG_PROC_FS) += proc.o obj-$(CONFIG_X86_FEATURE_NAMES) += capflags.o powerflags.o diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c new file mode 100644 index 000000000000..e48eb7313120 --- /dev/null +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -0,0 +1,113 @@ +/* Declare dependencies between CPUIDs */ +#include +#include +#include +#include + +struct cpuid_dep { + unsigned int feature; + unsigned int depends; +}; + +/* + * Table of CPUID features that depend on others. + * + * This only includes dependencies that can be usefully disabled, not + * features part of the base set (like FPU). + * + * Note this all is not __init / __initdata because it can be + * called from cpu hotplug. It shouldn't do anything in this case, + * but it's difficult to tell that to the init reference checker. + */ +const static struct cpuid_dep cpuid_deps[] = { + { X86_FEATURE_XSAVEOPT, X86_FEATURE_XSAVE }, + { X86_FEATURE_XSAVEC, X86_FEATURE_XSAVE }, + { X86_FEATURE_XSAVES, X86_FEATURE_XSAVE }, + { X86_FEATURE_AVX, X86_FEATURE_XSAVE }, + { X86_FEATURE_PKU, X86_FEATURE_XSAVE }, + { X86_FEATURE_MPX, X86_FEATURE_XSAVE }, + { X86_FEATURE_XGETBV1, X86_FEATURE_XSAVE }, + { X86_FEATURE_FXSR_OPT, X86_FEATURE_FXSR }, + { X86_FEATURE_XMM, X86_FEATURE_FXSR }, + { X86_FEATURE_XMM2, X86_FEATURE_XMM }, + { X86_FEATURE_XMM3, X86_FEATURE_XMM2 }, + { X86_FEATURE_XMM4_1, X86_FEATURE_XMM2 }, + { X86_FEATURE_XMM4_2, X86_FEATURE_XMM2 }, + { X86_FEATURE_XMM3, X86_FEATURE_XMM2 }, + { X86_FEATURE_PCLMULQDQ, X86_FEATURE_XMM2 }, + { X86_FEATURE_SSSE3, X86_FEATURE_XMM2, }, + { X86_FEATURE_F16C, X86_FEATURE_XMM2, }, + { X86_FEATURE_AES, X86_FEATURE_XMM2 }, + { X86_FEATURE_SHA_NI, X86_FEATURE_XMM2 }, + { X86_FEATURE_FMA, X86_FEATURE_AVX }, + { X86_FEATURE_AVX2, X86_FEATURE_AVX, }, + { X86_FEATURE_AVX512F, X86_FEATURE_AVX, }, + { X86_FEATURE_AVX512IFMA, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512PF, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512ER, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512CD, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512DQ, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512BW, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512VL, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512VBMI, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F }, + {} +}; + +static inline void __clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit) +{ + clear_bit32(bit, c->x86_capability); +} + +static inline void __setup_clear_cpu_cap(unsigned int bit) +{ + clear_cpu_cap(&boot_cpu_data, bit); + set_bit32(bit, cpu_caps_cleared); +} + +static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) +{ + if (!c) + __setup_clear_cpu_cap(feature); + else + __clear_cpu_cap(c, feature); +} + +static void do_clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) +{ + bool changed; + DECLARE_BITMAP(disable, NCAPINTS * sizeof(u32) * 8); + const struct cpuid_dep *d; + + clear_feature(c, feature); + + /* Collect all features to disable, handling dependencies */ + memset(disable, 0, sizeof(disable)); + __set_bit(feature, disable); + + /* Loop until we get a stable state. */ + do { + changed = false; + for (d = cpuid_deps; d->feature; d++) { + if (!test_bit(d->depends, disable)) + continue; + if (__test_and_set_bit(d->feature, disable)) + continue; + + changed = true; + clear_feature(c, d->feature); + } + } while (changed); +} + +void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) +{ + do_clear_cpu_cap(c, feature); +} + +void setup_clear_cpu_cap(unsigned int feature) +{ + do_clear_cpu_cap(NULL, feature); +} From f01d7efac7929e3630e25a84e101228ea907bbea Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Fri, 13 Oct 2017 14:56:43 -0700 Subject: [PATCH 0571/1013] x86/fpu: Parse clearcpuid= as early XSAVE argument commit 0c2a3913d6f50503f7c59d83a6219e39508cc898 upstream. With a followon patch we want to make clearcpuid affect the XSAVE configuration. But xsave is currently initialized before arguments are parsed. Move the clearcpuid= parsing into the special early xsave argument parsing code. Since clearcpuid= contains a = we need to keep the old __setup around as a dummy, otherwise it would end up as a environment variable in init's environment. Signed-off-by: Andi Kleen Reviewed-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171013215645.23166-4-andi@firstfloor.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/common.c | 16 +++++++--------- arch/x86/kernel/fpu/init.c | 11 +++++++++++ 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index c9176bae7fd8..03bb004bb15e 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1301,18 +1301,16 @@ void print_cpu_info(struct cpuinfo_x86 *c) pr_cont(")\n"); } -static __init int setup_disablecpuid(char *arg) +/* + * clearcpuid= was already parsed in fpu__init_parse_early_param. + * But we need to keep a dummy __setup around otherwise it would + * show up as an environment variable for init. + */ +static __init int setup_clearcpuid(char *arg) { - int bit; - - if (get_option(&arg, &bit) && bit >= 0 && bit < NCAPINTS * 32) - setup_clear_cpu_cap(bit); - else - return 0; - return 1; } -__setup("clearcpuid=", setup_disablecpuid); +__setup("clearcpuid=", setup_clearcpuid); #ifdef CONFIG_X86_64 DEFINE_PER_CPU_FIRST(union irq_stack_union, diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index 7affb7e3d9a5..6abd83572b01 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -249,6 +249,10 @@ static void __init fpu__init_system_ctx_switch(void) */ static void __init fpu__init_parse_early_param(void) { + char arg[32]; + char *argptr = arg; + int bit; + if (cmdline_find_option_bool(boot_command_line, "no387")) setup_clear_cpu_cap(X86_FEATURE_FPU); @@ -266,6 +270,13 @@ static void __init fpu__init_parse_early_param(void) if (cmdline_find_option_bool(boot_command_line, "noxsaves")) setup_clear_cpu_cap(X86_FEATURE_XSAVES); + + if (cmdline_find_option(boot_command_line, "clearcpuid", arg, + sizeof(arg)) && + get_option(&argptr, &bit) && + bit >= 0 && + bit < NCAPINTS * 32) + setup_clear_cpu_cap(bit); } /* From 0e7127aa76e05b51f767dbec74b236a1fa5eab06 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Fri, 13 Oct 2017 14:56:44 -0700 Subject: [PATCH 0572/1013] x86/fpu: Make XSAVE check the base CPUID features before enabling commit ccb18db2ab9d923df07e7495123fe5fb02329713 upstream. Before enabling XSAVE, not only check the XSAVE specific CPUID bits, but also the base CPUID features of the respective XSAVE feature. This allows to disable individual XSAVE states using the existing clearcpuid= option, which can be useful for performance testing and debugging, and also in general avoids inconsistencies. Signed-off-by: Andi Kleen Reviewed-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171013215645.23166-5-andi@firstfloor.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/fpu/xstate.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index f1d5476c9022..fb581292975b 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -15,6 +15,7 @@ #include #include +#include /* * Although we spell it out in here, the Processor Trace @@ -36,6 +37,19 @@ static const char *xfeature_names[] = "unknown xstate feature" , }; +static short xsave_cpuid_features[] __initdata = { + X86_FEATURE_FPU, + X86_FEATURE_XMM, + X86_FEATURE_AVX, + X86_FEATURE_MPX, + X86_FEATURE_MPX, + X86_FEATURE_AVX512F, + X86_FEATURE_AVX512F, + X86_FEATURE_AVX512F, + X86_FEATURE_INTEL_PT, + X86_FEATURE_PKU, +}; + /* * Mask of xstate features supported by the CPU and the kernel: */ @@ -726,6 +740,7 @@ void __init fpu__init_system_xstate(void) unsigned int eax, ebx, ecx, edx; static int on_boot_cpu __initdata = 1; int err; + int i; WARN_ON_FPU(!on_boot_cpu); on_boot_cpu = 0; @@ -759,6 +774,14 @@ void __init fpu__init_system_xstate(void) goto out_disable; } + /* + * Clear XSAVE features that are disabled in the normal CPUID. + */ + for (i = 0; i < ARRAY_SIZE(xsave_cpuid_features); i++) { + if (!boot_cpu_has(xsave_cpuid_features[i])) + xfeatures_mask &= ~BIT(i); + } + xfeatures_mask &= fpu__get_supported_xfeatures_mask(); /* Enable xstate instructions to be able to continue with initialization: */ From 110ad51cc874a943792e16258e52ff3360cb4f96 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Fri, 13 Oct 2017 14:56:45 -0700 Subject: [PATCH 0573/1013] x86/fpu: Remove the explicit clearing of XSAVE dependent features commit 73e3a7d2a7c3be29a5a22b85026f6cfa5664267f upstream. Clearing a CPU feature with setup_clear_cpu_cap() clears all features which depend on it. Expressing feature dependencies in one place is easier to maintain than keeping functions like fpu__xstate_clear_all_cpu_caps() up to date. The features which depend on XSAVE have their dependency expressed in the dependency table, so its sufficient to clear X86_FEATURE_XSAVE. Remove the explicit clearing of XSAVE dependent features. Signed-off-by: Andi Kleen Reviewed-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171013215645.23166-6-andi@firstfloor.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/fpu/xstate.c | 20 -------------------- 1 file changed, 20 deletions(-) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index fb581292975b..87a57b7642d3 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -73,26 +73,6 @@ unsigned int fpu_user_xstate_size; void fpu__xstate_clear_all_cpu_caps(void) { setup_clear_cpu_cap(X86_FEATURE_XSAVE); - setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT); - setup_clear_cpu_cap(X86_FEATURE_XSAVEC); - setup_clear_cpu_cap(X86_FEATURE_XSAVES); - setup_clear_cpu_cap(X86_FEATURE_AVX); - setup_clear_cpu_cap(X86_FEATURE_AVX2); - setup_clear_cpu_cap(X86_FEATURE_AVX512F); - setup_clear_cpu_cap(X86_FEATURE_AVX512IFMA); - setup_clear_cpu_cap(X86_FEATURE_AVX512PF); - setup_clear_cpu_cap(X86_FEATURE_AVX512ER); - setup_clear_cpu_cap(X86_FEATURE_AVX512CD); - setup_clear_cpu_cap(X86_FEATURE_AVX512DQ); - setup_clear_cpu_cap(X86_FEATURE_AVX512BW); - setup_clear_cpu_cap(X86_FEATURE_AVX512VL); - setup_clear_cpu_cap(X86_FEATURE_MPX); - setup_clear_cpu_cap(X86_FEATURE_XGETBV1); - setup_clear_cpu_cap(X86_FEATURE_AVX512VBMI); - setup_clear_cpu_cap(X86_FEATURE_PKU); - setup_clear_cpu_cap(X86_FEATURE_AVX512_4VNNIW); - setup_clear_cpu_cap(X86_FEATURE_AVX512_4FMAPS); - setup_clear_cpu_cap(X86_FEATURE_AVX512_VPOPCNTDQ); } /* From 3b57d66c8e53390b0035c578063db22c73ebce1a Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 16 Oct 2017 16:22:31 -0700 Subject: [PATCH 0574/1013] x86/platform/UV: Convert timers to use timer_setup() commit 376f3bcebdc999cc737d9052109cc33b573b3a8b upstream. In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Signed-off-by: Kees Cook Signed-off-by: Thomas Gleixner Cc: Dimitri Sivanich Cc: Russ Anderson Cc: Mike Travis Link: https://lkml.kernel.org/r/20171016232231.GA100493@beast Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/apic/x2apic_uv_x.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/apic/x2apic_uv_x.c b/arch/x86/kernel/apic/x2apic_uv_x.c index 0d57bb9079c9..c0b694810ff4 100644 --- a/arch/x86/kernel/apic/x2apic_uv_x.c +++ b/arch/x86/kernel/apic/x2apic_uv_x.c @@ -920,9 +920,8 @@ static __init void uv_rtc_init(void) /* * percpu heartbeat timer */ -static void uv_heartbeat(unsigned long ignored) +static void uv_heartbeat(struct timer_list *timer) { - struct timer_list *timer = &uv_scir_info->timer; unsigned char bits = uv_scir_info->state; /* Flip heartbeat bit: */ @@ -947,7 +946,7 @@ static int uv_heartbeat_enable(unsigned int cpu) struct timer_list *timer = &uv_cpu_scir_info(cpu)->timer; uv_set_cpu_scir_bits(cpu, SCIR_CPU_HEARTBEAT|SCIR_CPU_ACTIVITY); - setup_pinned_timer(timer, uv_heartbeat, cpu); + timer_setup(timer, uv_heartbeat, TIMER_PINNED); timer->expires = jiffies + SCIR_CPU_HB_INTERVAL; add_timer_on(timer, cpu); uv_cpu_scir_info(cpu)->enabled = 1; From 61d1ad3631150386f3c142daff97913c1770e0fa Mon Sep 17 00:00:00 2001 From: Kamalesh Babulal Date: Sat, 14 Oct 2017 20:17:54 +0530 Subject: [PATCH 0575/1013] objtool: Print top level commands on incorrect usage commit 6a93bb7e4a7d6670677d5b0eb980936eb9cc5d2e upstream. Print top-level objtool commands, along with the error on incorrect command line usage. Objtool command line parser exit's with code 129, for incorrect usage. Convert the cmd_usage() exit code also, to maintain consistency across objtool. After the patch: $ ./objtool -j Unknown option: -j usage: objtool COMMAND [ARGS] Commands: check Perform stack metadata validation on an object file orc Generate in-place ORC unwind tables for an object file $ echo $? 129 Signed-off-by: Kamalesh Babulal Acked-by: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1507992474-16142-1-git-send-email-kamalesh@linux.vnet.ibm.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/objtool/objtool.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/tools/objtool/objtool.c b/tools/objtool/objtool.c index 31e0f9143840..07f329919828 100644 --- a/tools/objtool/objtool.c +++ b/tools/objtool/objtool.c @@ -70,7 +70,7 @@ static void cmd_usage(void) printf("\n"); - exit(1); + exit(129); } static void handle_options(int *argc, const char ***argv) @@ -86,9 +86,7 @@ static void handle_options(int *argc, const char ***argv) break; } else { fprintf(stderr, "Unknown option: %s\n", cmd); - fprintf(stderr, "\n Usage: %s\n", - objtool_usage_string); - exit(1); + cmd_usage(); } (*argv)++; From 36295155d8d4c3f9fb425faac9120559c051b861 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 18 Oct 2017 19:39:35 +0200 Subject: [PATCH 0576/1013] x86/cpuid: Prevent out of bound access in do_clear_cpu_cap() commit 57b8b1a1856adaa849d02d547411a553a531022b upstream. do_clear_cpu_cap() allocates a bitmap to keep track of disabled feature dependencies. That bitmap is sized NCAPINTS * BITS_PER_INIT. The possible 'features' which can be handed in are larger than this, because after the capabilities the bug 'feature' bits occupy another 32bit. Not really obvious... So clearing any of the misfeature bits, as 32bit does for the F00F bug, accesses that bitmap out of bounds thereby corrupting the stack. Size the bitmap proper and add a sanity check to catch accidental out of bound access. Fixes: 0b00de857a64 ("x86/cpuid: Add generic table for CPUID dependencies") Reported-by: kernel test robot Signed-off-by: Thomas Gleixner Cc: Andi Kleen Cc: Borislav Petkov Link: https://lkml.kernel.org/r/20171018022023.GA12058@yexl-desktop Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/cpuid-deps.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index e48eb7313120..c1d49842a411 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -75,11 +75,17 @@ static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) __clear_cpu_cap(c, feature); } +/* Take the capabilities and the BUG bits into account */ +#define MAX_FEATURE_BITS ((NCAPINTS + NBUGINTS) * sizeof(u32) * 8) + static void do_clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int feature) { - bool changed; - DECLARE_BITMAP(disable, NCAPINTS * sizeof(u32) * 8); + DECLARE_BITMAP(disable, MAX_FEATURE_BITS); const struct cpuid_dep *d; + bool changed; + + if (WARN_ON(feature >= MAX_FEATURE_BITS)) + return; clear_feature(c, feature); From 4afaf6ea65acb07e151470580d89e6a2c0268610 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Fri, 29 Sep 2017 17:08:16 +0300 Subject: [PATCH 0577/1013] mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y commit 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4 upstream. Size of the mem_section[] array depends on the size of the physical address space. In preparation for boot-time switching between paging modes on x86-64 we need to make the allocation of mem_section[] dynamic, because otherwise we waste a lot of RAM: with CONFIG_NODE_SHIFT=10, mem_section[] size is 32kB for 4-level paging and 2MB for 5-level paging mode. The patch allocates the array on the first call to sparse_memory_present_with_active_regions(). Signed-off-by: Kirill A. Shutemov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Cyrill Gorcunov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-2-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- include/linux/mmzone.h | 6 +++++- mm/page_alloc.c | 10 ++++++++++ mm/sparse.c | 17 +++++++++++------ 3 files changed, 26 insertions(+), 7 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 18b06983131a..f0938257ee6d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1152,13 +1152,17 @@ struct mem_section { #define SECTION_ROOT_MASK (SECTIONS_PER_ROOT - 1) #ifdef CONFIG_SPARSEMEM_EXTREME -extern struct mem_section *mem_section[NR_SECTION_ROOTS]; +extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif static inline struct mem_section *__nr_to_section(unsigned long nr) { +#ifdef CONFIG_SPARSEMEM_EXTREME + if (!mem_section) + return NULL; +#endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d51c2087c498..02cf30bd01cd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5651,6 +5651,16 @@ void __init sparse_memory_present_with_active_regions(int nid) unsigned long start_pfn, end_pfn; int i, this_nid; +#ifdef CONFIG_SPARSEMEM_EXTREME + if (!mem_section) { + unsigned long size, align; + + size = sizeof(struct mem_section) * NR_SECTION_ROOTS; + align = 1 << (INTERNODE_CACHE_SHIFT); + mem_section = memblock_virt_alloc(size, align); + } +#endif + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) memory_present(this_nid, start_pfn, end_pfn); } diff --git a/mm/sparse.c b/mm/sparse.c index 4900707ae146..044138852baf 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -23,8 +23,7 @@ * 1) mem_section - memory sections, mem_map's for valid memory */ #ifdef CONFIG_SPARSEMEM_EXTREME -struct mem_section *mem_section[NR_SECTION_ROOTS] - ____cacheline_internodealigned_in_smp; +struct mem_section **mem_section; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] ____cacheline_internodealigned_in_smp; @@ -101,7 +100,7 @@ static inline int sparse_index_init(unsigned long section_nr, int nid) int __section_nr(struct mem_section* ms) { unsigned long root_nr; - struct mem_section* root; + struct mem_section *root = NULL; for (root_nr = 0; root_nr < NR_SECTION_ROOTS; root_nr++) { root = __nr_to_section(root_nr * SECTIONS_PER_ROOT); @@ -112,7 +111,7 @@ int __section_nr(struct mem_section* ms) break; } - VM_BUG_ON(root_nr == NR_SECTION_ROOTS); + VM_BUG_ON(!root); return (root_nr * SECTIONS_PER_ROOT) + (ms - root); } @@ -330,11 +329,17 @@ sparse_early_usemaps_alloc_pgdat_section(struct pglist_data *pgdat, static void __init check_usemap_section_nr(int nid, unsigned long *usemap) { unsigned long usemap_snr, pgdat_snr; - static unsigned long old_usemap_snr = NR_MEM_SECTIONS; - static unsigned long old_pgdat_snr = NR_MEM_SECTIONS; + static unsigned long old_usemap_snr; + static unsigned long old_pgdat_snr; struct pglist_data *pgdat = NODE_DATA(nid); int usemap_nid; + /* First call */ + if (!old_usemap_snr) { + old_usemap_snr = NR_MEM_SECTIONS; + old_pgdat_snr = NR_MEM_SECTIONS; + } + usemap_snr = pfn_to_section_nr(__pa(usemap) >> PAGE_SHIFT); pgdat_snr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); if (usemap_snr == pgdat_snr) From 873f59b8bd88c897fc1545efdf5742fc63f7a252 Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Fri, 29 Sep 2017 17:08:18 +0300 Subject: [PATCH 0578/1013] x86/kasan: Use the same shadow offset for 4- and 5-level paging commit 12a8cc7fcf54a8575f094be1e99032ec38aa045c upstream. We are going to support boot-time switching between 4- and 5-level paging. For KASAN it means we cannot have different KASAN_SHADOW_OFFSET for different paging modes: the constant is passed to gcc to generate code and cannot be changed at runtime. This patch changes KASAN code to use 0xdffffc0000000000 as shadow offset for both 4- and 5-level paging. For 5-level paging it means that shadow memory region is not aligned to PGD boundary anymore and we have to handle unaligned parts of the region properly. In addition, we have to exclude paravirt code from KASAN instrumentation as we now use set_pgd() before KASAN is fully ready. [kirill.shutemov@linux.intel.com: clenaup, changelog message] Signed-off-by: Andrey Ryabinin Signed-off-by: Kirill A. Shutemov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Cyrill Gorcunov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-4-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 2 +- arch/x86/Kconfig | 1 - arch/x86/kernel/Makefile | 3 +- arch/x86/mm/kasan_init_64.c | 101 +++++++++++++++++++++++++------- 4 files changed, 83 insertions(+), 24 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index b0798e281aa6..3448e675b462 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -34,7 +34,7 @@ ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) ... unused hole ... -ffd8000000000000 - fff7ffffffffffff (=53 bits) kasan shadow memory (8PB) +ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB) ... unused hole ... ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6aad9f070b9d..59d5b3fa88e6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -303,7 +303,6 @@ config ARCH_SUPPORTS_DEBUG_PAGEALLOC config KASAN_SHADOW_OFFSET hex depends on KASAN - default 0xdff8000000000000 if X86_5LEVEL default 0xdffffc0000000000 config HAVE_INTEL_TXT diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile index d12da41f72da..295abaa58add 100644 --- a/arch/x86/kernel/Makefile +++ b/arch/x86/kernel/Makefile @@ -25,7 +25,8 @@ endif KASAN_SANITIZE_head$(BITS).o := n KASAN_SANITIZE_dumpstack.o := n KASAN_SANITIZE_dumpstack_$(BITS).o := n -KASAN_SANITIZE_stacktrace.o := n +KASAN_SANITIZE_stacktrace.o := n +KASAN_SANITIZE_paravirt.o := n OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o := y OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o := y diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 8f5be3eb40dd..2b60dc6e64b1 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -16,6 +16,8 @@ extern struct range pfn_mapped[E820_MAX_ENTRIES]; +static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE); + static int __init map_range(struct range *range) { unsigned long start; @@ -31,8 +33,10 @@ static void __init clear_pgds(unsigned long start, unsigned long end) { pgd_t *pgd; + /* See comment in kasan_init() */ + unsigned long pgd_end = end & PGDIR_MASK; - for (; start < end; start += PGDIR_SIZE) { + for (; start < pgd_end; start += PGDIR_SIZE) { pgd = pgd_offset_k(start); /* * With folded p4d, pgd_clear() is nop, use p4d_clear() @@ -43,29 +47,61 @@ static void __init clear_pgds(unsigned long start, else pgd_clear(pgd); } + + pgd = pgd_offset_k(start); + for (; start < end; start += P4D_SIZE) + p4d_clear(p4d_offset(pgd, start)); +} + +static inline p4d_t *early_p4d_offset(pgd_t *pgd, unsigned long addr) +{ + unsigned long p4d; + + if (!IS_ENABLED(CONFIG_X86_5LEVEL)) + return (p4d_t *)pgd; + + p4d = __pa_nodebug(pgd_val(*pgd)) & PTE_PFN_MASK; + p4d += __START_KERNEL_map - phys_base; + return (p4d_t *)p4d + p4d_index(addr); +} + +static void __init kasan_early_p4d_populate(pgd_t *pgd, + unsigned long addr, + unsigned long end) +{ + pgd_t pgd_entry; + p4d_t *p4d, p4d_entry; + unsigned long next; + + if (pgd_none(*pgd)) { + pgd_entry = __pgd(_KERNPG_TABLE | __pa_nodebug(kasan_zero_p4d)); + set_pgd(pgd, pgd_entry); + } + + p4d = early_p4d_offset(pgd, addr); + do { + next = p4d_addr_end(addr, end); + + if (!p4d_none(*p4d)) + continue; + + p4d_entry = __p4d(_KERNPG_TABLE | __pa_nodebug(kasan_zero_pud)); + set_p4d(p4d, p4d_entry); + } while (p4d++, addr = next, addr != end && p4d_none(*p4d)); } static void __init kasan_map_early_shadow(pgd_t *pgd) { - int i; - unsigned long start = KASAN_SHADOW_START; + /* See comment in kasan_init() */ + unsigned long addr = KASAN_SHADOW_START & PGDIR_MASK; unsigned long end = KASAN_SHADOW_END; + unsigned long next; - for (i = pgd_index(start); start < end; i++) { - switch (CONFIG_PGTABLE_LEVELS) { - case 4: - pgd[i] = __pgd(__pa_nodebug(kasan_zero_pud) | - _KERNPG_TABLE); - break; - case 5: - pgd[i] = __pgd(__pa_nodebug(kasan_zero_p4d) | - _KERNPG_TABLE); - break; - default: - BUILD_BUG(); - } - start += PGDIR_SIZE; - } + pgd += pgd_index(addr); + do { + next = pgd_addr_end(addr, end); + kasan_early_p4d_populate(pgd, addr, next); + } while (pgd++, addr = next, addr != end); } #ifdef CONFIG_KASAN_INLINE @@ -102,7 +138,7 @@ void __init kasan_early_init(void) for (i = 0; i < PTRS_PER_PUD; i++) kasan_zero_pud[i] = __pud(pud_val); - for (i = 0; CONFIG_PGTABLE_LEVELS >= 5 && i < PTRS_PER_P4D; i++) + for (i = 0; IS_ENABLED(CONFIG_X86_5LEVEL) && i < PTRS_PER_P4D; i++) kasan_zero_p4d[i] = __p4d(p4d_val); kasan_map_early_shadow(early_top_pgt); @@ -118,12 +154,35 @@ void __init kasan_init(void) #endif memcpy(early_top_pgt, init_top_pgt, sizeof(early_top_pgt)); + + /* + * We use the same shadow offset for 4- and 5-level paging to + * facilitate boot-time switching between paging modes. + * As result in 5-level paging mode KASAN_SHADOW_START and + * KASAN_SHADOW_END are not aligned to PGD boundary. + * + * KASAN_SHADOW_START doesn't share PGD with anything else. + * We claim whole PGD entry to make things easier. + * + * KASAN_SHADOW_END lands in the last PGD entry and it collides with + * bunch of things like kernel code, modules, EFI mapping, etc. + * We need to take extra steps to not overwrite them. + */ + if (IS_ENABLED(CONFIG_X86_5LEVEL)) { + void *ptr; + + ptr = (void *)pgd_page_vaddr(*pgd_offset_k(KASAN_SHADOW_END)); + memcpy(tmp_p4d_table, (void *)ptr, sizeof(tmp_p4d_table)); + set_pgd(&early_top_pgt[pgd_index(KASAN_SHADOW_END)], + __pgd(__pa(tmp_p4d_table) | _KERNPG_TABLE)); + } + load_cr3(early_top_pgt); __flush_tlb_all(); - clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END); + clear_pgds(KASAN_SHADOW_START & PGDIR_MASK, KASAN_SHADOW_END); - kasan_populate_zero_shadow((void *)KASAN_SHADOW_START, + kasan_populate_zero_shadow((void *)(KASAN_SHADOW_START & PGDIR_MASK), kasan_mem_to_shadow((void *)PAGE_OFFSET)); for (i = 0; i < E820_MAX_ENTRIES; i++) { From 13bda9cfea5116f15a69725c15bc2614a6c803dc Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Fri, 29 Sep 2017 17:08:19 +0300 Subject: [PATCH 0579/1013] x86/xen: Provide pre-built page tables only for CONFIG_XEN_PV=y and CONFIG_XEN_PVH=y commit 4375c29985f155d7eb2346615d84e62d1b673682 upstream. Looks like we only need pre-built page tables in the CONFIG_XEN_PV=y and CONFIG_XEN_PVH=y cases. Let's not provide them for other configurations. Signed-off-by: Kirill A. Shutemov Reviewed-by: Juergen Gross Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Cyrill Gorcunov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-5-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/head_64.S | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 6733cac7cba5..9072d324d430 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -38,11 +38,12 @@ * */ -#define p4d_index(x) (((x) >> P4D_SHIFT) & (PTRS_PER_P4D-1)) #define pud_index(x) (((x) >> PUD_SHIFT) & (PTRS_PER_PUD-1)) +#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) PGD_PAGE_OFFSET = pgd_index(__PAGE_OFFSET_BASE) PGD_START_KERNEL = pgd_index(__START_KERNEL_map) +#endif L3_START_KERNEL = pud_index(__START_KERNEL_map) .text @@ -365,10 +366,7 @@ NEXT_PAGE(early_dynamic_pgts) .data -#ifndef CONFIG_XEN -NEXT_PAGE(init_top_pgt) - .fill 512,8,0 -#else +#if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) NEXT_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .org init_top_pgt + PGD_PAGE_OFFSET*8, 0 @@ -385,6 +383,9 @@ NEXT_PAGE(level2_ident_pgt) * Don't set NX because code runs from these pages. */ PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD) +#else +NEXT_PAGE(init_top_pgt) + .fill 512,8,0 #endif #ifdef CONFIG_X86_5LEVEL From af02cd973d86cf3f6bd6eea928ccc9e70a4132d0 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Fri, 29 Sep 2017 17:08:20 +0300 Subject: [PATCH 0580/1013] x86/xen: Drop 5-level paging support code from the XEN_PV code commit 773dd2fca581b0a80e5a33332cc8ee67e5a79cba upstream. It was decided 5-level paging is not going to be supported in XEN_PV. Let's drop the dead code from the XEN_PV code. Tested-by: Juergen Gross Signed-off-by: Kirill A. Shutemov Reviewed-by: Juergen Gross Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Cyrill Gorcunov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/20170929140821.37654-6-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/mmu_pv.c | 159 ++++++++++++++++-------------------------- 1 file changed, 60 insertions(+), 99 deletions(-) diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 71495f1a86d7..2ccdaba31a07 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -449,7 +449,7 @@ __visible pmd_t xen_make_pmd(pmdval_t pmd) } PV_CALLEE_SAVE_REGS_THUNK(xen_make_pmd); -#if CONFIG_PGTABLE_LEVELS == 4 +#ifdef CONFIG_X86_64 __visible pudval_t xen_pud_val(pud_t pud) { return pte_mfn_to_pfn(pud.pud); @@ -538,7 +538,7 @@ static void xen_set_p4d(p4d_t *ptr, p4d_t val) xen_mc_issue(PARAVIRT_LAZY_MMU); } -#endif /* CONFIG_PGTABLE_LEVELS == 4 */ +#endif /* CONFIG_X86_64 */ static int xen_pmd_walk(struct mm_struct *mm, pmd_t *pmd, int (*func)(struct mm_struct *mm, struct page *, enum pt_level), @@ -580,21 +580,17 @@ static int xen_p4d_walk(struct mm_struct *mm, p4d_t *p4d, int (*func)(struct mm_struct *mm, struct page *, enum pt_level), bool last, unsigned long limit) { - int i, nr, flush = 0; + int flush = 0; + pud_t *pud; - nr = last ? p4d_index(limit) + 1 : PTRS_PER_P4D; - for (i = 0; i < nr; i++) { - pud_t *pud; - if (p4d_none(p4d[i])) - continue; + if (p4d_none(*p4d)) + return flush; - pud = pud_offset(&p4d[i], 0); - if (PTRS_PER_PUD > 1) - flush |= (*func)(mm, virt_to_page(pud), PT_PUD); - flush |= xen_pud_walk(mm, pud, func, - last && i == nr - 1, limit); - } + pud = pud_offset(p4d, 0); + if (PTRS_PER_PUD > 1) + flush |= (*func)(mm, virt_to_page(pud), PT_PUD); + flush |= xen_pud_walk(mm, pud, func, last, limit); return flush; } @@ -644,8 +640,6 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd, continue; p4d = p4d_offset(&pgd[i], 0); - if (PTRS_PER_P4D > 1) - flush |= (*func)(mm, virt_to_page(p4d), PT_P4D); flush |= xen_p4d_walk(mm, p4d, func, i == nr - 1, limit); } @@ -1176,22 +1170,14 @@ static void __init xen_cleanmfnmap(unsigned long vaddr) { pgd_t *pgd; p4d_t *p4d; - unsigned int i; bool unpin; unpin = (vaddr == 2 * PGDIR_SIZE); vaddr &= PMD_MASK; pgd = pgd_offset_k(vaddr); p4d = p4d_offset(pgd, 0); - for (i = 0; i < PTRS_PER_P4D; i++) { - if (p4d_none(p4d[i])) - continue; - xen_cleanmfnmap_p4d(p4d + i, unpin); - } - if (IS_ENABLED(CONFIG_X86_5LEVEL)) { - set_pgd(pgd, __pgd(0)); - xen_cleanmfnmap_free_pgtbl(p4d, unpin); - } + if (!p4d_none(*p4d)) + xen_cleanmfnmap_p4d(p4d, unpin); } static void __init xen_pagetable_p2m_free(void) @@ -1692,7 +1678,7 @@ static void xen_release_pmd(unsigned long pfn) xen_release_ptpage(pfn, PT_PMD); } -#if CONFIG_PGTABLE_LEVELS >= 4 +#ifdef CONFIG_X86_64 static void xen_alloc_pud(struct mm_struct *mm, unsigned long pfn) { xen_alloc_ptpage(mm, pfn, PT_PUD); @@ -2029,13 +2015,12 @@ static phys_addr_t __init xen_early_virt_to_phys(unsigned long vaddr) */ void __init xen_relocate_p2m(void) { - phys_addr_t size, new_area, pt_phys, pmd_phys, pud_phys, p4d_phys; + phys_addr_t size, new_area, pt_phys, pmd_phys, pud_phys; unsigned long p2m_pfn, p2m_pfn_end, n_frames, pfn, pfn_end; - int n_pte, n_pt, n_pmd, n_pud, n_p4d, idx_pte, idx_pt, idx_pmd, idx_pud, idx_p4d; + int n_pte, n_pt, n_pmd, n_pud, idx_pte, idx_pt, idx_pmd, idx_pud; pte_t *pt; pmd_t *pmd; pud_t *pud; - p4d_t *p4d = NULL; pgd_t *pgd; unsigned long *new_p2m; int save_pud; @@ -2045,11 +2030,7 @@ void __init xen_relocate_p2m(void) n_pt = roundup(size, PMD_SIZE) >> PMD_SHIFT; n_pmd = roundup(size, PUD_SIZE) >> PUD_SHIFT; n_pud = roundup(size, P4D_SIZE) >> P4D_SHIFT; - if (PTRS_PER_P4D > 1) - n_p4d = roundup(size, PGDIR_SIZE) >> PGDIR_SHIFT; - else - n_p4d = 0; - n_frames = n_pte + n_pt + n_pmd + n_pud + n_p4d; + n_frames = n_pte + n_pt + n_pmd + n_pud; new_area = xen_find_free_area(PFN_PHYS(n_frames)); if (!new_area) { @@ -2065,76 +2046,56 @@ void __init xen_relocate_p2m(void) * To avoid any possible virtual address collision, just use * 2 * PUD_SIZE for the new area. */ - p4d_phys = new_area; - pud_phys = p4d_phys + PFN_PHYS(n_p4d); + pud_phys = new_area; pmd_phys = pud_phys + PFN_PHYS(n_pud); pt_phys = pmd_phys + PFN_PHYS(n_pmd); p2m_pfn = PFN_DOWN(pt_phys) + n_pt; pgd = __va(read_cr3_pa()); new_p2m = (unsigned long *)(2 * PGDIR_SIZE); - idx_p4d = 0; save_pud = n_pud; - do { - if (n_p4d > 0) { - p4d = early_memremap(p4d_phys, PAGE_SIZE); - clear_page(p4d); - n_pud = min(save_pud, PTRS_PER_P4D); - } - for (idx_pud = 0; idx_pud < n_pud; idx_pud++) { - pud = early_memremap(pud_phys, PAGE_SIZE); - clear_page(pud); - for (idx_pmd = 0; idx_pmd < min(n_pmd, PTRS_PER_PUD); - idx_pmd++) { - pmd = early_memremap(pmd_phys, PAGE_SIZE); - clear_page(pmd); - for (idx_pt = 0; idx_pt < min(n_pt, PTRS_PER_PMD); - idx_pt++) { - pt = early_memremap(pt_phys, PAGE_SIZE); - clear_page(pt); - for (idx_pte = 0; - idx_pte < min(n_pte, PTRS_PER_PTE); - idx_pte++) { - set_pte(pt + idx_pte, - pfn_pte(p2m_pfn, PAGE_KERNEL)); - p2m_pfn++; - } - n_pte -= PTRS_PER_PTE; - early_memunmap(pt, PAGE_SIZE); - make_lowmem_page_readonly(__va(pt_phys)); - pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, - PFN_DOWN(pt_phys)); - set_pmd(pmd + idx_pt, - __pmd(_PAGE_TABLE | pt_phys)); - pt_phys += PAGE_SIZE; + for (idx_pud = 0; idx_pud < n_pud; idx_pud++) { + pud = early_memremap(pud_phys, PAGE_SIZE); + clear_page(pud); + for (idx_pmd = 0; idx_pmd < min(n_pmd, PTRS_PER_PUD); + idx_pmd++) { + pmd = early_memremap(pmd_phys, PAGE_SIZE); + clear_page(pmd); + for (idx_pt = 0; idx_pt < min(n_pt, PTRS_PER_PMD); + idx_pt++) { + pt = early_memremap(pt_phys, PAGE_SIZE); + clear_page(pt); + for (idx_pte = 0; + idx_pte < min(n_pte, PTRS_PER_PTE); + idx_pte++) { + set_pte(pt + idx_pte, + pfn_pte(p2m_pfn, PAGE_KERNEL)); + p2m_pfn++; } - n_pt -= PTRS_PER_PMD; - early_memunmap(pmd, PAGE_SIZE); - make_lowmem_page_readonly(__va(pmd_phys)); - pin_pagetable_pfn(MMUEXT_PIN_L2_TABLE, - PFN_DOWN(pmd_phys)); - set_pud(pud + idx_pmd, __pud(_PAGE_TABLE | pmd_phys)); - pmd_phys += PAGE_SIZE; + n_pte -= PTRS_PER_PTE; + early_memunmap(pt, PAGE_SIZE); + make_lowmem_page_readonly(__va(pt_phys)); + pin_pagetable_pfn(MMUEXT_PIN_L1_TABLE, + PFN_DOWN(pt_phys)); + set_pmd(pmd + idx_pt, + __pmd(_PAGE_TABLE | pt_phys)); + pt_phys += PAGE_SIZE; } - n_pmd -= PTRS_PER_PUD; - early_memunmap(pud, PAGE_SIZE); - make_lowmem_page_readonly(__va(pud_phys)); - pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(pud_phys)); - if (n_p4d > 0) - set_p4d(p4d + idx_pud, __p4d(_PAGE_TABLE | pud_phys)); - else - set_pgd(pgd + 2 + idx_pud, __pgd(_PAGE_TABLE | pud_phys)); - pud_phys += PAGE_SIZE; - } - if (n_p4d > 0) { - save_pud -= PTRS_PER_P4D; - early_memunmap(p4d, PAGE_SIZE); - make_lowmem_page_readonly(__va(p4d_phys)); - pin_pagetable_pfn(MMUEXT_PIN_L4_TABLE, PFN_DOWN(p4d_phys)); - set_pgd(pgd + 2 + idx_p4d, __pgd(_PAGE_TABLE | p4d_phys)); - p4d_phys += PAGE_SIZE; + n_pt -= PTRS_PER_PMD; + early_memunmap(pmd, PAGE_SIZE); + make_lowmem_page_readonly(__va(pmd_phys)); + pin_pagetable_pfn(MMUEXT_PIN_L2_TABLE, + PFN_DOWN(pmd_phys)); + set_pud(pud + idx_pmd, __pud(_PAGE_TABLE | pmd_phys)); + pmd_phys += PAGE_SIZE; } - } while (++idx_p4d < n_p4d); + n_pmd -= PTRS_PER_PUD; + early_memunmap(pud, PAGE_SIZE); + make_lowmem_page_readonly(__va(pud_phys)); + pin_pagetable_pfn(MMUEXT_PIN_L3_TABLE, PFN_DOWN(pud_phys)); + set_pgd(pgd + 2 + idx_pud, __pgd(_PAGE_TABLE | pud_phys)); + pud_phys += PAGE_SIZE; + } /* Now copy the old p2m info to the new area. */ memcpy(new_p2m, xen_p2m_addr, size); @@ -2361,7 +2322,7 @@ static void __init xen_post_allocator_init(void) pv_mmu_ops.set_pte = xen_set_pte; pv_mmu_ops.set_pmd = xen_set_pmd; pv_mmu_ops.set_pud = xen_set_pud; -#if CONFIG_PGTABLE_LEVELS >= 4 +#ifdef CONFIG_X86_64 pv_mmu_ops.set_p4d = xen_set_p4d; #endif @@ -2371,7 +2332,7 @@ static void __init xen_post_allocator_init(void) pv_mmu_ops.alloc_pmd = xen_alloc_pmd; pv_mmu_ops.release_pte = xen_release_pte; pv_mmu_ops.release_pmd = xen_release_pmd; -#if CONFIG_PGTABLE_LEVELS >= 4 +#ifdef CONFIG_X86_64 pv_mmu_ops.alloc_pud = xen_alloc_pud; pv_mmu_ops.release_pud = xen_release_pud; #endif @@ -2435,14 +2396,14 @@ static const struct pv_mmu_ops xen_mmu_ops __initconst = { .make_pmd = PV_CALLEE_SAVE(xen_make_pmd), .pmd_val = PV_CALLEE_SAVE(xen_pmd_val), -#if CONFIG_PGTABLE_LEVELS >= 4 +#ifdef CONFIG_X86_64 .pud_val = PV_CALLEE_SAVE(xen_pud_val), .make_pud = PV_CALLEE_SAVE(xen_make_pud), .set_p4d = xen_set_p4d_hyper, .alloc_pud = xen_alloc_pmd_init, .release_pud = xen_release_pmd_init, -#endif /* CONFIG_PGTABLE_LEVELS == 4 */ +#endif /* CONFIG_X86_64 */ .activate_mm = xen_activate_mm, .dup_mmap = xen_dup_mmap, From 09807080a97cedc160bc42efd3e8c3a3b72a29c7 Mon Sep 17 00:00:00 2001 From: Dongjiu Geng Date: Tue, 17 Oct 2017 16:02:20 +0800 Subject: [PATCH 0581/1013] ACPI / APEI: remove the unused dead-code for SEA/NMI notification type commit c49870e89f4d2c21c76ebe90568246bb0f3572b7 upstream. For the SEA notification, the two functions ghes_sea_add() and ghes_sea_remove() are only called when CONFIG_ACPI_APEI_SEA is defined. If not, it will return errors in the ghes_probe() and not continue. If the probe is failed, the ghes_sea_remove() also has no chance to be called. Hence, remove the unnecessary handling when CONFIG_ACPI_APEI_SEA is not defined. For the NMI notification, it has the same issue as SEA notification, so also remove the unused dead-code for it. Signed-off-by: Dongjiu Geng Tested-by: Tyler Baicar Reviewed-by: Borislav Petkov Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/apei/ghes.c | 33 +++++---------------------------- 1 file changed, 5 insertions(+), 28 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 69ef0b6bf25d..cb7aceae3553 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -852,17 +852,8 @@ static void ghes_sea_remove(struct ghes *ghes) synchronize_rcu(); } #else /* CONFIG_ACPI_APEI_SEA */ -static inline void ghes_sea_add(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to add SEA notification which is not supported\n", - ghes->generic->header.source_id); -} - -static inline void ghes_sea_remove(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to remove SEA notification which is not supported\n", - ghes->generic->header.source_id); -} +static inline void ghes_sea_add(struct ghes *ghes) { } +static inline void ghes_sea_remove(struct ghes *ghes) { } #endif /* CONFIG_ACPI_APEI_SEA */ #ifdef CONFIG_HAVE_ACPI_APEI_NMI @@ -1064,23 +1055,9 @@ static void ghes_nmi_init_cxt(void) init_irq_work(&ghes_proc_irq_work, ghes_proc_in_irq); } #else /* CONFIG_HAVE_ACPI_APEI_NMI */ -static inline void ghes_nmi_add(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to add NMI notification which is not supported!\n", - ghes->generic->header.source_id); - BUG(); -} - -static inline void ghes_nmi_remove(struct ghes *ghes) -{ - pr_err(GHES_PFX "ID: %d, trying to remove NMI notification which is not supported!\n", - ghes->generic->header.source_id); - BUG(); -} - -static inline void ghes_nmi_init_cxt(void) -{ -} +static inline void ghes_nmi_add(struct ghes *ghes) { } +static inline void ghes_nmi_remove(struct ghes *ghes) { } +static inline void ghes_nmi_init_cxt(void) { } #endif /* CONFIG_HAVE_ACPI_APEI_NMI */ static int ghes_probe(struct platform_device *ghes_dev) From 20e9bfd7b8a3463c5ecc1d22b60be30a32607dc4 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 20 Oct 2017 11:21:35 -0500 Subject: [PATCH 0582/1013] x86/asm: Don't use the confusing '.ifeq' directive commit 82c62fa0c49aa305104013cee4468772799bb391 upstream. I find the '.ifeq ' directive to be confusing. Reading it quickly seems to suggest its opposite meaning, or that it's missing an argument. Improve readability by replacing all of its x86 uses with '.if == 0'. Signed-off-by: Josh Poimboeuf Cc: Andrei Vagin Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/757da028e802c7e98d23fbab8d234b1063e161cf.1508516398.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 2 +- arch/x86/kernel/head_32.S | 2 +- arch/x86/kernel/head_64.S | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 2e956afe272c..5557e94c5f1c 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -818,7 +818,7 @@ ENTRY(\sym) ASM_CLAC - .ifeq \has_error_code + .if \has_error_code == 0 pushq $-1 /* ORIG_RAX: no syscall to restart */ .endif diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index f1d528bb66a6..46c9f17b18e9 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -402,7 +402,7 @@ ENTRY(early_idt_handler_array) # 24(%rsp) error code i = 0 .rept NUM_EXCEPTION_VECTORS - .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 + .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 pushl $0 # Dummy error code, to make stack frame uniform .endif pushl $i # 20(%esp) Vector number diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 9072d324d430..078eab728b47 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -275,7 +275,7 @@ ENDPROC(start_cpu0) ENTRY(early_idt_handler_array) i = 0 .rept NUM_EXCEPTION_VECTORS - .ifeq (EXCEPTION_ERRCODE_MASK >> i) & 1 + .if ((EXCEPTION_ERRCODE_MASK >> i) & 1) == 0 UNWIND_HINT_IRET_REGS pushq $0 # Dummy error code, to make stack frame uniform .else From 3df257ddc5f48214e229b26f3e3d5f42e0f45380 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Fri, 27 Oct 2017 13:11:10 +0900 Subject: [PATCH 0583/1013] x86/build: Beautify build log of syscall headers commit af8e947079a7dab0480b5d6db6b093fd04b86fc9 upstream. This makes the build log look nicer. Before: SYSTBL arch/x86/entry/syscalls/../../include/generated/asm/syscalls_32.h SYSHDR arch/x86/entry/syscalls/../../include/generated/asm/unistd_32_ia32.h SYSHDR arch/x86/entry/syscalls/../../include/generated/asm/unistd_64_x32.h SYSTBL arch/x86/entry/syscalls/../../include/generated/asm/syscalls_64.h SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_32.h SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_64.h SYSHDR arch/x86/entry/syscalls/../../include/generated/uapi/asm/unistd_x32.h After: SYSTBL arch/x86/include/generated/asm/syscalls_32.h SYSHDR arch/x86/include/generated/asm/unistd_32_ia32.h SYSHDR arch/x86/include/generated/asm/unistd_64_x32.h SYSTBL arch/x86/include/generated/asm/syscalls_64.h SYSHDR arch/x86/include/generated/uapi/asm/unistd_32.h SYSHDR arch/x86/include/generated/uapi/asm/unistd_64.h SYSHDR arch/x86/include/generated/uapi/asm/unistd_x32.h Signed-off-by: Masahiro Yamada Acked-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: "H. Peter Anvin" Cc: linux-kbuild@vger.kernel.org Link: http://lkml.kernel.org/r/1509077470-2735-1-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/syscalls/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/syscalls/Makefile b/arch/x86/entry/syscalls/Makefile index 331f1dca5085..6fb9b57ed5ba 100644 --- a/arch/x86/entry/syscalls/Makefile +++ b/arch/x86/entry/syscalls/Makefile @@ -1,6 +1,6 @@ # SPDX-License-Identifier: GPL-2.0 -out := $(obj)/../../include/generated/asm -uapi := $(obj)/../../include/generated/uapi/asm +out := arch/$(SRCARCH)/include/generated/asm +uapi := arch/$(SRCARCH)/include/generated/uapi/asm # Create output directory if not already present _dummy := $(shell [ -d '$(out)' ] || mkdir -p '$(out)') \ From c60238f5712ba5a7d8391914f8290ae1bf5c08c0 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Sat, 28 Oct 2017 09:30:38 +0800 Subject: [PATCH 0584/1013] x86/mm/64: Rename the register_page_bootmem_memmap() 'size' parameter to 'nr_pages' commit 15670bfe19905b1dcbb63137f40d718b59d84479 upstream. register_page_bootmem_memmap()'s 3rd 'size' parameter is named in a somewhat misleading fashion - rename it to 'nr_pages' which makes the units of it much clearer. Meanwhile rename the existing local variable 'nr_pages' to 'nr_pmd_pages', a more expressive name, to avoid conflict with new function parameter 'nr_pages'. (Also clean up the unnecessary parentheses in which get_order() is called.) Signed-off-by: Baoquan He Acked-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: akpm@linux-foundation.org Link: http://lkml.kernel.org/r/1509154238-23250-1-git-send-email-bhe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init_64.c | 10 +++++----- include/linux/mm.h | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 048fbe8fc274..adcea90a2046 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1426,16 +1426,16 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node) #if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HAVE_BOOTMEM_INFO_NODE) void register_page_bootmem_memmap(unsigned long section_nr, - struct page *start_page, unsigned long size) + struct page *start_page, unsigned long nr_pages) { unsigned long addr = (unsigned long)start_page; - unsigned long end = (unsigned long)(start_page + size); + unsigned long end = (unsigned long)(start_page + nr_pages); unsigned long next; pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd; - unsigned int nr_pages; + unsigned int nr_pmd_pages; struct page *page; for (; addr < end; addr = next) { @@ -1482,9 +1482,9 @@ void register_page_bootmem_memmap(unsigned long section_nr, if (pmd_none(*pmd)) continue; - nr_pages = 1 << (get_order(PMD_SIZE)); + nr_pmd_pages = 1 << get_order(PMD_SIZE); page = pmd_page(*pmd); - while (nr_pages--) + while (nr_pmd_pages--) get_page_bootmem(section_nr, page++, SECTION_INFO); } diff --git a/include/linux/mm.h b/include/linux/mm.h index db647d428100..f50deada0f5c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2510,7 +2510,7 @@ void vmemmap_populate_print_last(void); void vmemmap_free(unsigned long start, unsigned long end); #endif void register_page_bootmem_memmap(unsigned long section_nr, struct page *map, - unsigned long size); + unsigned long nr_pages); enum mf_flags { MF_COUNT_INCREASED = 1 << 0, From 12dc3fa30178a4635e435ed2b4e543f76c10dc93 Mon Sep 17 00:00:00 2001 From: Gayatri Kammela Date: Mon, 30 Oct 2017 18:20:29 -0700 Subject: [PATCH 0585/1013] x86/cpufeatures: Enable new SSE/AVX/AVX512 CPU features commit c128dbfa0f879f8ce7b79054037889b0b2240728 upstream. Add a few new SSE/AVX/AVX512 instruction groups/features for enumeration in /proc/cpuinfo: AVX512_VBMI2, GFNI, VAES, VPCLMULQDQ, AVX512_VNNI, AVX512_BITALG. CPUID.(EAX=7,ECX=0):ECX[bit 6] AVX512_VBMI2 CPUID.(EAX=7,ECX=0):ECX[bit 8] GFNI CPUID.(EAX=7,ECX=0):ECX[bit 9] VAES CPUID.(EAX=7,ECX=0):ECX[bit 10] VPCLMULQDQ CPUID.(EAX=7,ECX=0):ECX[bit 11] AVX512_VNNI CPUID.(EAX=7,ECX=0):ECX[bit 12] AVX512_BITALG Detailed information of CPUID bits for these features can be found in the Intel Architecture Instruction Set Extensions and Future Features Programming Interface document (refer to Table 1-1. and Table 1-2.). A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=197239 Signed-off-by: Gayatri Kammela Acked-by: Thomas Gleixner Cc: Andi Kleen Cc: Fenghua Yu Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Ravi Shankar Cc: Ricardo Neri Cc: Yang Zhong Cc: bp@alien8.de Link: http://lkml.kernel.org/r/1509412829-23380-1-git-send-email-gayatri.kammela@intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 6 ++++++ arch/x86/kernel/cpu/cpuid-deps.c | 6 ++++++ 2 files changed, 12 insertions(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 9a0a5b128a30..74370734663c 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -300,6 +300,12 @@ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ +#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ +#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ +#define X86_FEATURE_VPCLMULQDQ (16*32+ 10) /* Carry-Less Multiplication Double Quadword */ +#define X86_FEATURE_AVX512_VNNI (16*32+ 11) /* Vector Neural Network Instructions */ +#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index c1d49842a411..c21f22d836ad 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -50,6 +50,12 @@ const static struct cpuid_dep cpuid_deps[] = { { X86_FEATURE_AVX512BW, X86_FEATURE_AVX512F }, { X86_FEATURE_AVX512VL, X86_FEATURE_AVX512F }, { X86_FEATURE_AVX512VBMI, X86_FEATURE_AVX512F }, + { X86_FEATURE_AVX512_VBMI2, X86_FEATURE_AVX512VL }, + { X86_FEATURE_GFNI, X86_FEATURE_AVX512VL }, + { X86_FEATURE_VAES, X86_FEATURE_AVX512VL }, + { X86_FEATURE_VPCLMULQDQ, X86_FEATURE_AVX512VL }, + { X86_FEATURE_AVX512_VNNI, X86_FEATURE_AVX512VL }, + { X86_FEATURE_AVX512_BITALG, X86_FEATURE_AVX512VL }, { X86_FEATURE_AVX512_4VNNIW, X86_FEATURE_AVX512F }, { X86_FEATURE_AVX512_4FMAPS, X86_FEATURE_AVX512F }, { X86_FEATURE_AVX512_VPOPCNTDQ, X86_FEATURE_AVX512F }, From 4ed772a7dee902953066869264934c36c7689097 Mon Sep 17 00:00:00 2001 From: Ricardo Neri Date: Fri, 27 Oct 2017 13:25:28 -0700 Subject: [PATCH 0586/1013] x86/mm: Relocate page fault error codes to traps.h commit 1067f030994c69ca1fba8c607437c8895dcf8509 upstream. Up to this point, only fault.c used the definitions of the page fault error codes. Thus, it made sense to keep them within such file. Other portions of code might be interested in those definitions too. For instance, the User- Mode Instruction Prevention emulation code will use such definitions to emulate a page fault when it is unable to successfully copy the results of the emulated instructions to user space. While relocating the error code enumeration, the prefix X86_ is used to make it consistent with the rest of the definitions in traps.h. Of course, code using the enumeration had to be updated as well. No functional changes were performed. Signed-off-by: Ricardo Neri Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Andy Lutomirski Cc: "Michael S. Tsirkin" Cc: Peter Zijlstra Cc: Dave Hansen Cc: ricardo.neri@intel.com Cc: Paul Gortmaker Cc: Huang Rui Cc: Shuah Khan Cc: Jonathan Corbet Cc: Jiri Slaby Cc: "Ravi V. Shankar" Cc: Chris Metcalf Cc: Brian Gerst Cc: Josh Poimboeuf Cc: Chen Yucong Cc: Vlastimil Babka Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Andrew Morton Cc: "Kirill A. Shutemov" Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/traps.h | 18 ++++++++ arch/x86/mm/fault.c | 88 ++++++++++++++---------------------- 2 files changed, 52 insertions(+), 54 deletions(-) diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index b0cced97a6ce..a230dd1896f0 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -145,4 +145,22 @@ enum { X86_TRAP_IRET = 32, /* 32, IRET Exception */ }; +/* + * Page fault error code bits: + * + * bit 0 == 0: no page found 1: protection fault + * bit 1 == 0: read access 1: write access + * bit 2 == 0: kernel-mode access 1: user-mode access + * bit 3 == 1: use of reserved bit detected + * bit 4 == 1: fault was an instruction fetch + * bit 5 == 1: protection keys block access + */ +enum x86_pf_error_code { + X86_PF_PROT = 1 << 0, + X86_PF_WRITE = 1 << 1, + X86_PF_USER = 1 << 2, + X86_PF_RSVD = 1 << 3, + X86_PF_INSTR = 1 << 4, + X86_PF_PK = 1 << 5, +}; #endif /* _ASM_X86_TRAPS_H */ diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index b0ff378650a9..3109ba6c6ede 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -29,26 +29,6 @@ #define CREATE_TRACE_POINTS #include -/* - * Page fault error code bits: - * - * bit 0 == 0: no page found 1: protection fault - * bit 1 == 0: read access 1: write access - * bit 2 == 0: kernel-mode access 1: user-mode access - * bit 3 == 1: use of reserved bit detected - * bit 4 == 1: fault was an instruction fetch - * bit 5 == 1: protection keys block access - */ -enum x86_pf_error_code { - - PF_PROT = 1 << 0, - PF_WRITE = 1 << 1, - PF_USER = 1 << 2, - PF_RSVD = 1 << 3, - PF_INSTR = 1 << 4, - PF_PK = 1 << 5, -}; - /* * Returns 0 if mmiotrace is disabled, or if the fault is not * handled by mmiotrace: @@ -150,7 +130,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) * If it was a exec (instruction fetch) fault on NX page, then * do not ignore the fault: */ - if (error_code & PF_INSTR) + if (error_code & X86_PF_INSTR) return 0; instr = (void *)convert_ip_to_linear(current, regs); @@ -180,7 +160,7 @@ is_prefetch(struct pt_regs *regs, unsigned long error_code, unsigned long addr) * siginfo so userspace can discover which protection key was set * on the PTE. * - * If we get here, we know that the hardware signaled a PF_PK + * If we get here, we know that the hardware signaled a X86_PF_PK * fault and that there was a VMA once we got in the fault * handler. It does *not* guarantee that the VMA we find here * was the one that we faulted on. @@ -205,7 +185,7 @@ static void fill_sig_info_pkey(int si_code, siginfo_t *info, u32 *pkey) /* * force_sig_info_fault() is called from a number of * contexts, some of which have a VMA and some of which - * do not. The PF_PK handing happens after we have a + * do not. The X86_PF_PK handing happens after we have a * valid VMA, so we should never reach this without a * valid VMA. */ @@ -698,7 +678,7 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, if (!oops_may_print()) return; - if (error_code & PF_INSTR) { + if (error_code & X86_PF_INSTR) { unsigned int level; pgd_t *pgd; pte_t *pte; @@ -780,7 +760,7 @@ no_context(struct pt_regs *regs, unsigned long error_code, */ if (current->thread.sig_on_uaccess_err && signal) { tsk->thread.trap_nr = X86_TRAP_PF; - tsk->thread.error_code = error_code | PF_USER; + tsk->thread.error_code = error_code | X86_PF_USER; tsk->thread.cr2 = address; /* XXX: hwpoison faults will set the wrong code. */ @@ -898,7 +878,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, struct task_struct *tsk = current; /* User mode accesses just cause a SIGSEGV */ - if (error_code & PF_USER) { + if (error_code & X86_PF_USER) { /* * It's possible to have interrupts off here: */ @@ -919,7 +899,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, * Instruction fetch faults in the vsyscall page might need * emulation. */ - if (unlikely((error_code & PF_INSTR) && + if (unlikely((error_code & X86_PF_INSTR) && ((address & ~0xfff) == VSYSCALL_ADDR))) { if (emulate_vsyscall(regs, address)) return; @@ -932,7 +912,7 @@ __bad_area_nosemaphore(struct pt_regs *regs, unsigned long error_code, * are always protection faults. */ if (address >= TASK_SIZE_MAX) - error_code |= PF_PROT; + error_code |= X86_PF_PROT; if (likely(show_unhandled_signals)) show_signal_msg(regs, error_code, address, tsk); @@ -993,11 +973,11 @@ static inline bool bad_area_access_from_pkeys(unsigned long error_code, if (!boot_cpu_has(X86_FEATURE_OSPKE)) return false; - if (error_code & PF_PK) + if (error_code & X86_PF_PK) return true; /* this checks permission keys on the VMA: */ - if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), - (error_code & PF_INSTR), foreign)) + if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE), + (error_code & X86_PF_INSTR), foreign)) return true; return false; } @@ -1025,7 +1005,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, int code = BUS_ADRERR; /* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { + if (!(error_code & X86_PF_USER)) { no_context(regs, error_code, address, SIGBUS, BUS_ADRERR); return; } @@ -1053,14 +1033,14 @@ static noinline void mm_fault_error(struct pt_regs *regs, unsigned long error_code, unsigned long address, u32 *pkey, unsigned int fault) { - if (fatal_signal_pending(current) && !(error_code & PF_USER)) { + if (fatal_signal_pending(current) && !(error_code & X86_PF_USER)) { no_context(regs, error_code, address, 0, 0); return; } if (fault & VM_FAULT_OOM) { /* Kernel mode? Handle exceptions or die: */ - if (!(error_code & PF_USER)) { + if (!(error_code & X86_PF_USER)) { no_context(regs, error_code, address, SIGSEGV, SEGV_MAPERR); return; @@ -1085,16 +1065,16 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, static int spurious_fault_check(unsigned long error_code, pte_t *pte) { - if ((error_code & PF_WRITE) && !pte_write(*pte)) + if ((error_code & X86_PF_WRITE) && !pte_write(*pte)) return 0; - if ((error_code & PF_INSTR) && !pte_exec(*pte)) + if ((error_code & X86_PF_INSTR) && !pte_exec(*pte)) return 0; /* * Note: We do not do lazy flushing on protection key - * changes, so no spurious fault will ever set PF_PK. + * changes, so no spurious fault will ever set X86_PF_PK. */ - if ((error_code & PF_PK)) + if ((error_code & X86_PF_PK)) return 1; return 1; @@ -1140,8 +1120,8 @@ spurious_fault(unsigned long error_code, unsigned long address) * change, so user accesses are not expected to cause spurious * faults. */ - if (error_code != (PF_WRITE | PF_PROT) - && error_code != (PF_INSTR | PF_PROT)) + if (error_code != (X86_PF_WRITE | X86_PF_PROT) && + error_code != (X86_PF_INSTR | X86_PF_PROT)) return 0; pgd = init_mm.pgd + pgd_index(address); @@ -1201,19 +1181,19 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) * always an unconditional error and can never result in * a follow-up action to resolve the fault, like a COW. */ - if (error_code & PF_PK) + if (error_code & X86_PF_PK) return 1; /* * Make sure to check the VMA so that we do not perform - * faults just to hit a PF_PK as soon as we fill in a + * faults just to hit a X86_PF_PK as soon as we fill in a * page. */ - if (!arch_vma_access_permitted(vma, (error_code & PF_WRITE), - (error_code & PF_INSTR), foreign)) + if (!arch_vma_access_permitted(vma, (error_code & X86_PF_WRITE), + (error_code & X86_PF_INSTR), foreign)) return 1; - if (error_code & PF_WRITE) { + if (error_code & X86_PF_WRITE) { /* write, present and write, not present: */ if (unlikely(!(vma->vm_flags & VM_WRITE))) return 1; @@ -1221,7 +1201,7 @@ access_error(unsigned long error_code, struct vm_area_struct *vma) } /* read, present: */ - if (unlikely(error_code & PF_PROT)) + if (unlikely(error_code & X86_PF_PROT)) return 1; /* read, not present: */ @@ -1244,7 +1224,7 @@ static inline bool smap_violation(int error_code, struct pt_regs *regs) if (!static_cpu_has(X86_FEATURE_SMAP)) return false; - if (error_code & PF_USER) + if (error_code & X86_PF_USER) return false; if (!user_mode(regs) && (regs->flags & X86_EFLAGS_AC)) @@ -1297,7 +1277,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * protection error (error_code & 9) == 0. */ if (unlikely(fault_in_kernel_space(address))) { - if (!(error_code & (PF_RSVD | PF_USER | PF_PROT))) { + if (!(error_code & (X86_PF_RSVD | X86_PF_USER | X86_PF_PROT))) { if (vmalloc_fault(address) >= 0) return; @@ -1325,7 +1305,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, if (unlikely(kprobes_fault(regs))) return; - if (unlikely(error_code & PF_RSVD)) + if (unlikely(error_code & X86_PF_RSVD)) pgtable_bad(regs, error_code, address); if (unlikely(smap_violation(error_code, regs))) { @@ -1351,7 +1331,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, */ if (user_mode(regs)) { local_irq_enable(); - error_code |= PF_USER; + error_code |= X86_PF_USER; flags |= FAULT_FLAG_USER; } else { if (regs->flags & X86_EFLAGS_IF) @@ -1360,9 +1340,9 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS, 1, regs, address); - if (error_code & PF_WRITE) + if (error_code & X86_PF_WRITE) flags |= FAULT_FLAG_WRITE; - if (error_code & PF_INSTR) + if (error_code & X86_PF_INSTR) flags |= FAULT_FLAG_INSTRUCTION; /* @@ -1382,7 +1362,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, * space check, thus avoiding the deadlock: */ if (unlikely(!down_read_trylock(&mm->mmap_sem))) { - if ((error_code & PF_USER) == 0 && + if (!(error_code & X86_PF_USER) && !search_exception_tables(regs->ip)) { bad_area_nosemaphore(regs, error_code, address, NULL); return; @@ -1409,7 +1389,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long error_code, bad_area(regs, error_code, address); return; } - if (error_code & PF_USER) { + if (error_code & X86_PF_USER) { /* * Accessing the stack below %sp is always a bug. * The large cushion allows instructions like enter From 985cba48423594b8e8792770a65543d7ee689eba Mon Sep 17 00:00:00 2001 From: Ricardo Neri Date: Fri, 27 Oct 2017 13:25:29 -0700 Subject: [PATCH 0587/1013] x86/boot: Relocate definition of the initial state of CR0 commit b0ce5b8c95c83a7b98c679b117e3d6ae6f97154b upstream. Both head_32.S and head_64.S utilize the same value to initialize the control register CR0. Also, other parts of the kernel might want to access this initial definition (e.g., emulation code for User-Mode Instruction Prevention uses this state to provide a sane dummy value for CR0 when emulating the smsw instruction). Thus, relocate this definition to a header file from which it can be conveniently accessed. Suggested-by: Borislav Petkov Signed-off-by: Ricardo Neri Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Andy Lutomirski Cc: "Michael S. Tsirkin" Cc: Peter Zijlstra Cc: Dave Hansen Cc: ricardo.neri@intel.com Cc: linux-mm@kvack.org Cc: Paul Gortmaker Cc: Huang Rui Cc: Shuah Khan Cc: linux-arch@vger.kernel.org Cc: Jonathan Corbet Cc: Jiri Slaby Cc: "Ravi V. Shankar" Cc: Denys Vlasenko Cc: Chris Metcalf Cc: Brian Gerst Cc: Josh Poimboeuf Cc: Chen Yucong Cc: Vlastimil Babka Cc: Dave Hansen Cc: Andy Lutomirski Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Andrew Morton Cc: Linus Torvalds Link: https://lkml.kernel.org/r/1509135945-13762-3-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/uapi/asm/processor-flags.h | 3 +++ arch/x86/kernel/head_32.S | 3 --- arch/x86/kernel/head_64.S | 3 --- 3 files changed, 3 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index 6f3355399665..53b4ca55ebb6 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -152,5 +152,8 @@ #define CX86_ARR_BASE 0xc4 #define CX86_RCR_BASE 0xdc +#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ + X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ + X86_CR0_PG) #endif /* _UAPI_ASM_X86_PROCESSOR_FLAGS_H */ diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 46c9f17b18e9..c29020907886 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -212,9 +212,6 @@ ENTRY(startup_32_smp) #endif .Ldefault_entry: -#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ - X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ - X86_CR0_PG) movl $(CR0_STATE & ~X86_CR0_PG),%eax movl %eax,%cr0 diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 078eab728b47..1f747734318d 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -154,9 +154,6 @@ ENTRY(secondary_startup_64) 1: wrmsr /* Make changes effective */ /* Setup cr0 */ -#define CR0_STATE (X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | \ - X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | \ - X86_CR0_PG) movl $CR0_STATE, %eax /* Make changes effective */ movq %rax, %cr0 From ab3e5dfff36f4af4df09d86384376ebd188d7ae9 Mon Sep 17 00:00:00 2001 From: Ricardo Neri Date: Fri, 27 Oct 2017 13:25:30 -0700 Subject: [PATCH 0588/1013] ptrace,x86: Make user_64bit_mode() available to 32-bit builds commit e27c310af5c05cf876d9cad006928076c27f54d4 upstream. In its current form, user_64bit_mode() can only be used when CONFIG_X86_64 is selected. This implies that code built with CONFIG_X86_64=n cannot use it. If a piece of code needs to be built for both CONFIG_X86_64=y and CONFIG_X86_64=n and wants to use this function, it needs to wrap it in an #ifdef/#endif; potentially, in multiple places. This can be easily avoided with a single #ifdef/#endif pair within user_64bit_mode() itself. Suggested-by: Borislav Petkov Signed-off-by: Ricardo Neri Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: "Michael S. Tsirkin" Cc: Peter Zijlstra Cc: Dave Hansen Cc: ricardo.neri@intel.com Cc: Adrian Hunter Cc: Paul Gortmaker Cc: Huang Rui Cc: Qiaowei Ren Cc: Shuah Khan Cc: Kees Cook Cc: Jonathan Corbet Cc: Jiri Slaby Cc: Dmitry Vyukov Cc: "Ravi V. Shankar" Cc: Chris Metcalf Cc: Brian Gerst Cc: Arnaldo Carvalho de Melo Cc: Andy Lutomirski Cc: Colin Ian King Cc: Chen Yucong Cc: Adam Buchbinder Cc: Vlastimil Babka Cc: Lorenzo Stoakes Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Andrew Morton Cc: Thomas Garnier Link: https://lkml.kernel.org/r/1509135945-13762-4-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/ptrace.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h index c0e3c45cf6ab..14131dd06b29 100644 --- a/arch/x86/include/asm/ptrace.h +++ b/arch/x86/include/asm/ptrace.h @@ -136,9 +136,9 @@ static inline int v8086_mode(struct pt_regs *regs) #endif } -#ifdef CONFIG_X86_64 static inline bool user_64bit_mode(struct pt_regs *regs) { +#ifdef CONFIG_X86_64 #ifndef CONFIG_PARAVIRT /* * On non-paravirt systems, this is the only long mode CPL 3 @@ -149,8 +149,12 @@ static inline bool user_64bit_mode(struct pt_regs *regs) /* Headers are too twisted for this to go in paravirt.h. */ return regs->cs == __USER_CS || regs->cs == pv_info.extra_user_64bit_cs; #endif +#else /* !CONFIG_X86_64 */ + return false; +#endif } +#ifdef CONFIG_X86_64 #define current_user_stack_pointer() current_pt_regs()->sp #define compat_user_stack_pointer() current_pt_regs()->sp #endif From b7ee7fcca8a525c9bc636fab23779f18d3de9e80 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:58:58 -0700 Subject: [PATCH 0589/1013] x86/entry/64: Remove the restore_c_regs_and_iret label commit 9da78ba6b47b46428cfdfc0851511ab29c869798 upstream. The only user was the 64-bit opportunistic SYSRET failure path, and that path didn't really need it. This change makes the opportunistic SYSRET code a bit more straightforward and gets rid of the label. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/be3006a7ad3326e3458cf1cc55d416252cbe1986.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 5557e94c5f1c..33eb2127eac8 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -246,7 +246,6 @@ entry_SYSCALL64_slow_path: call do_syscall_64 /* returns with IRQs disabled */ return_from_SYSCALL_64: - RESTORE_EXTRA_REGS TRACE_IRQS_IRETQ /* we're about to change IF */ /* @@ -315,6 +314,7 @@ return_from_SYSCALL_64: */ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ + RESTORE_EXTRA_REGS RESTORE_C_REGS_EXCEPT_RCX_R11 movq RSP(%rsp), %rsp UNWIND_HINT_EMPTY @@ -322,7 +322,7 @@ syscall_return_via_sysret: opportunistic_sysret_failed: SWAPGS - jmp restore_c_regs_and_iret + jmp restore_regs_and_iret END(entry_SYSCALL_64) ENTRY(stub_ptregs_64) @@ -639,7 +639,6 @@ retint_kernel: */ GLOBAL(restore_regs_and_iret) RESTORE_EXTRA_REGS -restore_c_regs_and_iret: RESTORE_C_REGS REMOVE_PT_GPREGS_FROM_STACK 8 INTERRUPT_RETURN From 65236abc42b068506720f4a5168e5e63636138dd Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:58:59 -0700 Subject: [PATCH 0590/1013] x86/entry/64: Split the IRET-to-user and IRET-to-kernel paths commit 26c4ef9c49d8a0341f6d97ce2cfdd55d1236ed29 upstream. These code paths will diverge soon. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/dccf8c7b3750199b4b30383c812d4e2931811509.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 34 +++++++++++++++++++++++--------- arch/x86/entry/entry_64_compat.S | 2 +- arch/x86/kernel/head_64.S | 2 +- 3 files changed, 27 insertions(+), 11 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 33eb2127eac8..9e262c8ccc2d 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -322,7 +322,7 @@ syscall_return_via_sysret: opportunistic_sysret_failed: SWAPGS - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_usermode END(entry_SYSCALL_64) ENTRY(stub_ptregs_64) @@ -424,7 +424,7 @@ ENTRY(ret_from_fork) call syscall_return_slowpath /* returns with IRQs disabled */ TRACE_IRQS_ON /* user mode is traced as IRQS on */ SWAPGS - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_usermode 1: /* kernel thread */ @@ -613,7 +613,20 @@ GLOBAL(retint_user) call prepare_exit_to_usermode TRACE_IRQS_IRETQ SWAPGS - jmp restore_regs_and_iret + +GLOBAL(restore_regs_and_return_to_usermode) +#ifdef CONFIG_DEBUG_ENTRY + /* Assert that pt_regs indicates user mode. */ + testl $3, CS(%rsp) + jnz 1f + ud2 +1: +#endif + RESTORE_EXTRA_REGS + RESTORE_C_REGS + REMOVE_PT_GPREGS_FROM_STACK 8 + INTERRUPT_RETURN + /* Returning to kernel space */ retint_kernel: @@ -633,11 +646,14 @@ retint_kernel: */ TRACE_IRQS_IRETQ -/* - * At this label, code paths which return to kernel and to user, - * which come from interrupts/exception and from syscalls, merge. - */ -GLOBAL(restore_regs_and_iret) +GLOBAL(restore_regs_and_return_to_kernel) +#ifdef CONFIG_DEBUG_ENTRY + /* Assert that pt_regs indicates kernel mode. */ + testl $3, CS(%rsp) + jz 1f + ud2 +1: +#endif RESTORE_EXTRA_REGS RESTORE_C_REGS REMOVE_PT_GPREGS_FROM_STACK 8 @@ -1328,7 +1344,7 @@ ENTRY(nmi) * work, because we don't want to enable interrupts. */ SWAPGS - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_usermode .Lnmi_from_kernel: /* diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index b5c7a56ed256..2fd682212d1d 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -338,7 +338,7 @@ ENTRY(entry_INT80_compat) /* Go back to user mode. */ TRACE_IRQS_ON SWAPGS - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_usermode END(entry_INT80_compat) ENTRY(stub32_clone) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 1f747734318d..7dca675fe78d 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -328,7 +328,7 @@ early_idt_handler_common: 20: decl early_recursion_flag(%rip) - jmp restore_regs_and_iret + jmp restore_regs_and_return_to_kernel END(early_idt_handler_common) __INITDATA From 7d4bb32bc6fded10ff2874ae1d6d655b5511829c Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:00 -0700 Subject: [PATCH 0591/1013] x86/entry/64: Move SWAPGS into the common IRET-to-usermode path commit 8a055d7f411d41755ce30db5bb65b154777c4b78 upstream. All of the code paths that ended up doing IRET to usermode did SWAPGS immediately beforehand. Move the SWAPGS into the common code. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/27fd6f45b7cd640de38fb9066fd0349bcd11f8e1.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 32 ++++++++++++++------------------ arch/x86/entry/entry_64_compat.S | 3 +-- 2 files changed, 15 insertions(+), 20 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 9e262c8ccc2d..f67ac2647dbf 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -250,12 +250,14 @@ return_from_SYSCALL_64: /* * Try to use SYSRET instead of IRET if we're returning to - * a completely clean 64-bit userspace context. + * a completely clean 64-bit userspace context. If we're not, + * go to the slow exit path. */ movq RCX(%rsp), %rcx movq RIP(%rsp), %r11 - cmpq %rcx, %r11 /* RCX == RIP */ - jne opportunistic_sysret_failed + + cmpq %rcx, %r11 /* SYSRET requires RCX == RIP */ + jne swapgs_restore_regs_and_return_to_usermode /* * On Intel CPUs, SYSRET with non-canonical RCX/RIP will #GP @@ -273,14 +275,14 @@ return_from_SYSCALL_64: /* If this changed %rcx, it was not canonical */ cmpq %rcx, %r11 - jne opportunistic_sysret_failed + jne swapgs_restore_regs_and_return_to_usermode cmpq $__USER_CS, CS(%rsp) /* CS must match SYSRET */ - jne opportunistic_sysret_failed + jne swapgs_restore_regs_and_return_to_usermode movq R11(%rsp), %r11 cmpq %r11, EFLAGS(%rsp) /* R11 == RFLAGS */ - jne opportunistic_sysret_failed + jne swapgs_restore_regs_and_return_to_usermode /* * SYSCALL clears RF when it saves RFLAGS in R11 and SYSRET cannot @@ -301,12 +303,12 @@ return_from_SYSCALL_64: * would never get past 'stuck_here'. */ testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11 - jnz opportunistic_sysret_failed + jnz swapgs_restore_regs_and_return_to_usermode /* nothing to check for RSP */ cmpq $__USER_DS, SS(%rsp) /* SS must match SYSRET */ - jne opportunistic_sysret_failed + jne swapgs_restore_regs_and_return_to_usermode /* * We win! This label is here just for ease of understanding @@ -319,10 +321,6 @@ syscall_return_via_sysret: movq RSP(%rsp), %rsp UNWIND_HINT_EMPTY USERGS_SYSRET64 - -opportunistic_sysret_failed: - SWAPGS - jmp restore_regs_and_return_to_usermode END(entry_SYSCALL_64) ENTRY(stub_ptregs_64) @@ -423,8 +421,7 @@ ENTRY(ret_from_fork) movq %rsp, %rdi call syscall_return_slowpath /* returns with IRQs disabled */ TRACE_IRQS_ON /* user mode is traced as IRQS on */ - SWAPGS - jmp restore_regs_and_return_to_usermode + jmp swapgs_restore_regs_and_return_to_usermode 1: /* kernel thread */ @@ -612,9 +609,8 @@ GLOBAL(retint_user) mov %rsp,%rdi call prepare_exit_to_usermode TRACE_IRQS_IRETQ - SWAPGS -GLOBAL(restore_regs_and_return_to_usermode) +GLOBAL(swapgs_restore_regs_and_return_to_usermode) #ifdef CONFIG_DEBUG_ENTRY /* Assert that pt_regs indicates user mode. */ testl $3, CS(%rsp) @@ -622,6 +618,7 @@ GLOBAL(restore_regs_and_return_to_usermode) ud2 1: #endif + SWAPGS RESTORE_EXTRA_REGS RESTORE_C_REGS REMOVE_PT_GPREGS_FROM_STACK 8 @@ -1343,8 +1340,7 @@ ENTRY(nmi) * Return back to user mode. We must *not* do the normal exit * work, because we don't want to enable interrupts. */ - SWAPGS - jmp restore_regs_and_return_to_usermode + jmp swapgs_restore_regs_and_return_to_usermode .Lnmi_from_kernel: /* diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 2fd682212d1d..568e130d932c 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -337,8 +337,7 @@ ENTRY(entry_INT80_compat) /* Go back to user mode. */ TRACE_IRQS_ON - SWAPGS - jmp restore_regs_and_return_to_usermode + jmp swapgs_restore_regs_and_return_to_usermode END(entry_INT80_compat) ENTRY(stub32_clone) From 37bae8ecdb518a676c5e12520f393d0069e57dfe Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:01 -0700 Subject: [PATCH 0592/1013] x86/entry/64: Simplify reg restore code in the standard IRET paths commit e872045bfd9c465a8555bab4b8567d56a4d2d3bb upstream. The old code restored all the registers with movq instead of pop. In theory, this was done because some CPUs have higher movq throughput, but any gain there would be tiny and is almost certainly outweighed by the higher text size. This saves 96 bytes of text. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/ad82520a207ccd851b04ba613f4f752b33ac05f7.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/calling.h | 21 +++++++++++++++++++++ arch/x86/entry/entry_64.S | 12 ++++++------ 2 files changed, 27 insertions(+), 6 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 6e160031cfea..67d1df061d1a 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -152,6 +152,27 @@ For 32-bit we have the following conventions - kernel is built with UNWIND_HINT_REGS offset=\offset extra=0 .endm + .macro POP_EXTRA_REGS + popq %r15 + popq %r14 + popq %r13 + popq %r12 + popq %rbp + popq %rbx + .endm + + .macro POP_C_REGS + popq %r11 + popq %r10 + popq %r9 + popq %r8 + popq %rax + popq %rcx + popq %rdx + popq %rsi + popq %rdi + .endm + .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1 .if \rstor_r11 movq 6*8(%rsp), %r11 diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index f67ac2647dbf..77a67ed991fc 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -619,9 +619,9 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode) 1: #endif SWAPGS - RESTORE_EXTRA_REGS - RESTORE_C_REGS - REMOVE_PT_GPREGS_FROM_STACK 8 + POP_EXTRA_REGS + POP_C_REGS + addq $8, %rsp /* skip regs->orig_ax */ INTERRUPT_RETURN @@ -651,9 +651,9 @@ GLOBAL(restore_regs_and_return_to_kernel) ud2 1: #endif - RESTORE_EXTRA_REGS - RESTORE_C_REGS - REMOVE_PT_GPREGS_FROM_STACK 8 + POP_EXTRA_REGS + POP_C_REGS + addq $8, %rsp /* skip regs->orig_ax */ INTERRUPT_RETURN ENTRY(native_iret) From b774233bcdd4753267c7238a4b7754b9cc2b45ed Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:02 -0700 Subject: [PATCH 0593/1013] x86/entry/64: Shrink paranoid_exit_restore and make labels local commit e53178328c9b96fbdbc719e78c93b5687ee007c3 upstream. paranoid_exit_restore was a copy of restore_regs_and_return_to_kernel. Merge them and make the paranoid_exit internal labels local. Keeping .Lparanoid_exit makes the code a bit shorter because it allows a 2-byte jnz instead of a 5-byte jnz. Saves 96 bytes of text. ( This is still a bit suboptimal in a non-CONFIG_TRACE_IRQFLAGS kernel, but fixing that would make the code rather messy. ) Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/510d66a1895cda9473c84b1086f0bb974f22de6a.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 77a67ed991fc..a828fb23d6fa 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1124,17 +1124,14 @@ ENTRY(paranoid_exit) DISABLE_INTERRUPTS(CLBR_ANY) TRACE_IRQS_OFF_DEBUG testl %ebx, %ebx /* swapgs needed? */ - jnz paranoid_exit_no_swapgs + jnz .Lparanoid_exit_no_swapgs TRACE_IRQS_IRETQ SWAPGS_UNSAFE_STACK - jmp paranoid_exit_restore -paranoid_exit_no_swapgs: + jmp .Lparanoid_exit_restore +.Lparanoid_exit_no_swapgs: TRACE_IRQS_IRETQ_DEBUG -paranoid_exit_restore: - RESTORE_EXTRA_REGS - RESTORE_C_REGS - REMOVE_PT_GPREGS_FROM_STACK 8 - INTERRUPT_RETURN +.Lparanoid_exit_restore: + jmp restore_regs_and_return_to_kernel END(paranoid_exit) /* From 673b1522c6585548798b281419f4ef67b8d17045 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:03 -0700 Subject: [PATCH 0594/1013] x86/entry/64: Use pop instead of movq in syscall_return_via_sysret commit 4fbb39108f972437c44e5ffa781b56635d496826 upstream. Saves 64 bytes. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/6609b7f74ab31c36604ad746e019ea8495aec76c.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index a828fb23d6fa..74dd2ec1e4e9 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -316,10 +316,18 @@ return_from_SYSCALL_64: */ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ - RESTORE_EXTRA_REGS - RESTORE_C_REGS_EXCEPT_RCX_R11 - movq RSP(%rsp), %rsp UNWIND_HINT_EMPTY + POP_EXTRA_REGS + popq %rsi /* skip r11 */ + popq %r10 + popq %r9 + popq %r8 + popq %rax + popq %rsi /* skip rcx */ + popq %rdx + popq %rsi + popq %rdi + movq RSP-ORIG_RAX(%rsp), %rsp USERGS_SYSRET64 END(entry_SYSCALL_64) From e7273aae139703223ef2128c88188a6a3182e855 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:04 -0700 Subject: [PATCH 0595/1013] x86/entry/64: Merge the fast and slow SYSRET paths commit a512210643da8082cb44181dba8b18e752bd68f0 upstream. They did almost the same thing. Remove a bunch of pointless instructions (mostly hidden in macros) and reduce cognitive load by merging them. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1204e20233fcab9130a1ba80b3b1879b5db3fc1f.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 74dd2ec1e4e9..e6c75f6ac7e0 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -221,10 +221,9 @@ entry_SYSCALL_64_fastpath: TRACE_IRQS_ON /* user mode is traced as IRQs on */ movq RIP(%rsp), %rcx movq EFLAGS(%rsp), %r11 - RESTORE_C_REGS_EXCEPT_RCX_R11 - movq RSP(%rsp), %rsp + addq $6*8, %rsp /* skip extra regs -- they were preserved */ UNWIND_HINT_EMPTY - USERGS_SYSRET64 + jmp .Lpop_c_regs_except_rcx_r11_and_sysret 1: /* @@ -318,6 +317,7 @@ syscall_return_via_sysret: /* rcx and r11 are already restored (see code above) */ UNWIND_HINT_EMPTY POP_EXTRA_REGS +.Lpop_c_regs_except_rcx_r11_and_sysret: popq %rsi /* skip r11 */ popq %r10 popq %r9 From 2550871499e4521792db6f282964d994d25aa459 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:05 -0700 Subject: [PATCH 0596/1013] x86/entry/64: Use POP instead of MOV to restore regs on NMI return commit 471ee4832209e986029b9fabdaad57b1eecb856b upstream. This gets rid of the last user of the old RESTORE_..._REGS infrastructure. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/652a260f17a160789bc6a41d997f98249b73e2ab.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index e6c75f6ac7e0..50f887b15ac2 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1560,11 +1560,14 @@ end_repeat_nmi: nmi_swapgs: SWAPGS_UNSAFE_STACK nmi_restore: - RESTORE_EXTRA_REGS - RESTORE_C_REGS + POP_EXTRA_REGS + POP_C_REGS - /* Point RSP at the "iret" frame. */ - REMOVE_PT_GPREGS_FROM_STACK 6*8 + /* + * Skip orig_ax and the "outermost" frame to point RSP at the "iret" + * at the "iret" frame. + */ + addq $6*8, %rsp /* * Clear "NMI executing". Set DF first so that we can easily From 8d50dee92fb29c953a7101af63b0aea667c13296 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:06 -0700 Subject: [PATCH 0597/1013] x86/entry/64: Remove the RESTORE_..._REGS infrastructure commit c39858de696f0cc160a544455e8403d663d577e9 upstream. All users of RESTORE_EXTRA_REGS, RESTORE_C_REGS and such, and REMOVE_PT_GPREGS_FROM_STACK are gone. Delete the macros. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/c32672f6e47c561893316d48e06c7656b1039a36.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/calling.h | 52 ---------------------------------------- 1 file changed, 52 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 67d1df061d1a..3fd8bc560fae 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -142,16 +142,6 @@ For 32-bit we have the following conventions - kernel is built with UNWIND_HINT_REGS offset=\offset .endm - .macro RESTORE_EXTRA_REGS offset=0 - movq 0*8+\offset(%rsp), %r15 - movq 1*8+\offset(%rsp), %r14 - movq 2*8+\offset(%rsp), %r13 - movq 3*8+\offset(%rsp), %r12 - movq 4*8+\offset(%rsp), %rbp - movq 5*8+\offset(%rsp), %rbx - UNWIND_HINT_REGS offset=\offset extra=0 - .endm - .macro POP_EXTRA_REGS popq %r15 popq %r14 @@ -173,48 +163,6 @@ For 32-bit we have the following conventions - kernel is built with popq %rdi .endm - .macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1 - .if \rstor_r11 - movq 6*8(%rsp), %r11 - .endif - .if \rstor_r8910 - movq 7*8(%rsp), %r10 - movq 8*8(%rsp), %r9 - movq 9*8(%rsp), %r8 - .endif - .if \rstor_rax - movq 10*8(%rsp), %rax - .endif - .if \rstor_rcx - movq 11*8(%rsp), %rcx - .endif - .if \rstor_rdx - movq 12*8(%rsp), %rdx - .endif - movq 13*8(%rsp), %rsi - movq 14*8(%rsp), %rdi - UNWIND_HINT_IRET_REGS offset=16*8 - .endm - .macro RESTORE_C_REGS - RESTORE_C_REGS_HELPER 1,1,1,1,1 - .endm - .macro RESTORE_C_REGS_EXCEPT_RAX - RESTORE_C_REGS_HELPER 0,1,1,1,1 - .endm - .macro RESTORE_C_REGS_EXCEPT_RCX - RESTORE_C_REGS_HELPER 1,0,1,1,1 - .endm - .macro RESTORE_C_REGS_EXCEPT_R11 - RESTORE_C_REGS_HELPER 1,1,0,1,1 - .endm - .macro RESTORE_C_REGS_EXCEPT_RCX_R11 - RESTORE_C_REGS_HELPER 1,0,0,1,1 - .endm - - .macro REMOVE_PT_GPREGS_FROM_STACK addskip=0 - subq $-(15*8+\addskip), %rsp - .endm - .macro icebp .byte 0xf1 .endm From f53f7a3f0156a39eccea5bfd8a746121627c9c27 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 2 Nov 2017 00:59:07 -0700 Subject: [PATCH 0598/1013] xen, x86/entry/64: Add xen NMI trap entry commit 43e4111086a70c78bedb6ad990bee97f17b27a6e upstream. Instead of trying to execute any NMI via the bare metal's NMI trap handler use a Xen specific one for PV domains, like we do for e.g. debug traps. As in a PV domain the NMI is handled via the normal kernel stack this is the correct thing to do. This will enable us to get rid of the very fragile and questionable dependencies between the bare metal NMI handler and Xen assumptions believed to be broken anyway. Signed-off-by: Juergen Gross Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/5baf5c0528d58402441550c5770b98e7961e7680.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 2 +- arch/x86/include/asm/traps.h | 2 +- arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/xen-asm_64.S | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 50f887b15ac2..01348b44010c 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1079,6 +1079,7 @@ idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK idtentry stack_segment do_stack_segment has_error_code=1 #ifdef CONFIG_XEN +idtentry xennmi do_nmi has_error_code=0 idtentry xendebug do_debug has_error_code=0 idtentry xenint3 do_int3 has_error_code=0 #endif @@ -1241,7 +1242,6 @@ ENTRY(error_exit) END(error_exit) /* Runs on exception stack */ -/* XXX: broken on Xen PV */ ENTRY(nmi) UNWIND_HINT_IRET_REGS /* diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index a230dd1896f0..1fadd310ff68 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -38,9 +38,9 @@ asmlinkage void simd_coprocessor_error(void); #if defined(CONFIG_X86_64) && defined(CONFIG_XEN_PV) asmlinkage void xen_divide_error(void); +asmlinkage void xen_xennmi(void); asmlinkage void xen_xendebug(void); asmlinkage void xen_xenint3(void); -asmlinkage void xen_nmi(void); asmlinkage void xen_overflow(void); asmlinkage void xen_bounds(void); asmlinkage void xen_invalid_op(void); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index d4396e27b1fb..60eb6dcacaac 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -601,7 +601,7 @@ static struct trap_array_entry trap_array[] = { #ifdef CONFIG_X86_MCE { machine_check, xen_machine_check, true }, #endif - { nmi, xen_nmi, true }, + { nmi, xen_xennmi, true }, { overflow, xen_overflow, false }, #ifdef CONFIG_IA32_EMULATION { entry_INT80_compat, xen_entry_INT80_compat, false }, diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S index c98a48c861fd..8a10c9a9e2b5 100644 --- a/arch/x86/xen/xen-asm_64.S +++ b/arch/x86/xen/xen-asm_64.S @@ -30,7 +30,7 @@ xen_pv_trap debug xen_pv_trap xendebug xen_pv_trap int3 xen_pv_trap xenint3 -xen_pv_trap nmi +xen_pv_trap xennmi xen_pv_trap overflow xen_pv_trap bounds xen_pv_trap invalid_op From 6ff096cf2bf86824579bdcce3743143f993fbe5c Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:08 -0700 Subject: [PATCH 0599/1013] x86/entry/64: De-Xen-ify our NMI code commit 929bacec21478a72c78e4f29f98fb799bd00105a upstream. Xen PV is fundamentally incompatible with our fancy NMI code: it doesn't use IST at all, and Xen entries clobber two stack slots below the hardware frame. Drop Xen PV support from our NMI code entirely. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Acked-by: Juergen Gross Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/bfbe711b5ae03f672f8848999a8eb2711efc7f98.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 01348b44010c..a9c248b44305 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1241,9 +1241,13 @@ ENTRY(error_exit) jmp retint_user END(error_exit) -/* Runs on exception stack */ +/* + * Runs on exception stack. Xen PV does not go through this path at all, + * so we can use real assembly here. + */ ENTRY(nmi) UNWIND_HINT_IRET_REGS + /* * We allow breakpoints in NMIs. If a breakpoint occurs, then * the iretq it performs will take us out of NMI context. @@ -1301,7 +1305,7 @@ ENTRY(nmi) * stacks lest we corrupt the "NMI executing" variable. */ - SWAPGS_UNSAFE_STACK + swapgs cld movq %rsp, %rdx movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp @@ -1466,7 +1470,7 @@ nested_nmi_out: popq %rdx /* We are returning to kernel mode, so this cannot result in a fault. */ - INTERRUPT_RETURN + iretq first_nmi: /* Restore rdx. */ @@ -1497,7 +1501,7 @@ first_nmi: pushfq /* RFLAGS */ pushq $__KERNEL_CS /* CS */ pushq $1f /* RIP */ - INTERRUPT_RETURN /* continues at repeat_nmi below */ + iretq /* continues at repeat_nmi below */ UNWIND_HINT_IRET_REGS 1: #endif @@ -1572,20 +1576,22 @@ nmi_restore: /* * Clear "NMI executing". Set DF first so that we can easily * distinguish the remaining code between here and IRET from - * the SYSCALL entry and exit paths. On a native kernel, we - * could just inspect RIP, but, on paravirt kernels, - * INTERRUPT_RETURN can translate into a jump into a - * hypercall page. + * the SYSCALL entry and exit paths. + * + * We arguably should just inspect RIP instead, but I (Andy) wrote + * this code when I had the misapprehension that Xen PV supported + * NMIs, and Xen PV would break that approach. */ std movq $0, 5*8(%rsp) /* clear "NMI executing" */ /* - * INTERRUPT_RETURN reads the "iret" frame and exits the NMI - * stack in a single instruction. We are returning to kernel - * mode, so this cannot result in a fault. + * iretq reads the "iret" frame and exits the NMI stack in a + * single instruction. We are returning to kernel mode, so this + * cannot result in a fault. Similarly, we don't need to worry + * about espfix64 on the way back to kernel mode. */ - INTERRUPT_RETURN + iretq END(nmi) ENTRY(ignore_sysret) From ebef3548d5777e842acb0ab1bba599643525536c Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:09 -0700 Subject: [PATCH 0600/1013] x86/entry/32: Pull the MSR_IA32_SYSENTER_CS update code out of native_load_sp0() commit bd7dc5a6afac719d8ce4092391eef2c7e83c2a75 upstream. This causes the MSR_IA32_SYSENTER_CS write to move out of the paravirt callback. This shouldn't affect Xen PV: Xen already ignores MSR_IA32_SYSENTER_ESP writes. In any event, Xen doesn't support vm86() in a useful way. Note to any potential backporters: This patch won't break lguest, as lguest didn't have any SYSENTER support at all. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/75cf09fe03ae778532d0ca6c65aa58e66bc2f90c.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 7 ------- arch/x86/include/asm/switch_to.h | 12 ++++++++++++ arch/x86/kernel/process_32.c | 4 +++- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/vm86_32.c | 6 +++++- 5 files changed, 21 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index bdac19ab2488..cdb4614f49fa 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -521,13 +521,6 @@ static inline void native_load_sp0(struct tss_struct *tss, struct thread_struct *thread) { tss->x86_tss.sp0 = thread->sp0; -#ifdef CONFIG_X86_32 - /* Only happens when SEP is enabled, no need to test "SEP"arately: */ - if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) { - tss->x86_tss.ss1 = thread->sysenter_cs; - wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0); - } -#endif } static inline void native_swapgs(void) diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 899084b70412..0fe929d1ff70 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -73,4 +73,16 @@ do { \ ((last) = __switch_to_asm((prev), (next))); \ } while (0) +#ifdef CONFIG_X86_32 +static inline void refresh_sysenter_cs(struct thread_struct *thread) +{ + /* Only happens when SEP is enabled, no need to test "SEP"arately: */ + if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs)) + return; + + this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs); + wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0); +} +#endif + #endif /* _ASM_X86_SWITCH_TO_H */ diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 11966251cd42..0936ed3da6b6 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -284,9 +284,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) /* * Reload esp0 and cpu_current_top_of_stack. This changes - * current_thread_info(). + * current_thread_info(). Refresh the SYSENTER configuration in + * case prev or next is vm86. */ load_sp0(tss, next); + refresh_sysenter_cs(next); this_cpu_write(cpu_current_top_of_stack, (unsigned long)task_stack_page(next_p) + THREAD_SIZE); diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 302e7b2572d1..a6ff6d1a0110 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -464,7 +464,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) */ this_cpu_write(current_task, next_p); - /* Reload esp0 and ss1. This changes current_thread_info(). */ + /* Reload sp0. */ load_sp0(tss, next); /* diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c index 68244742ecb0..a59b9ce75efc 100644 --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -55,6 +55,7 @@ #include #include #include +#include /* * Known problems: @@ -150,6 +151,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval) tsk->thread.sp0 = vm86->saved_sp0; tsk->thread.sysenter_cs = __KERNEL_CS; load_sp0(tss, &tsk->thread); + refresh_sysenter_cs(&tsk->thread); vm86->saved_sp0 = 0; put_cpu(); @@ -369,8 +371,10 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) /* make room for real-mode segments */ tsk->thread.sp0 += 16; - if (static_cpu_has(X86_FEATURE_SEP)) + if (static_cpu_has(X86_FEATURE_SEP)) { tsk->thread.sysenter_cs = 0; + refresh_sysenter_cs(&tsk->thread); + } load_sp0(tss, &tsk->thread); put_cpu(); From e37558449abadb91ac13b85737d963f230118bdf Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:10 -0700 Subject: [PATCH 0601/1013] x86/entry/64: Pass SP0 directly to load_sp0() commit da51da189a24bb9b7e2d5a123be096e51a4695a5 upstream. load_sp0() had an odd signature: void load_sp0(struct tss_struct *tss, struct thread_struct *thread); Simplify it to: void load_sp0(unsigned long sp0); Also simplify a few get_cpu()/put_cpu() sequences to preempt_disable()/preempt_enable(). Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/2655d8b42ed940aa384fe18ee1129bbbcf730a08.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/paravirt.h | 5 ++--- arch/x86/include/asm/paravirt_types.h | 2 +- arch/x86/include/asm/processor.h | 9 ++++----- arch/x86/kernel/cpu/common.c | 4 ++-- arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/vm86_32.c | 14 ++++++-------- arch/x86/xen/enlighten_pv.c | 7 +++---- 8 files changed, 20 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index fd81228e8037..283efcaac8af 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -16,10 +16,9 @@ #include #include -static inline void load_sp0(struct tss_struct *tss, - struct thread_struct *thread) +static inline void load_sp0(unsigned long sp0) { - PVOP_VCALL2(pv_cpu_ops.load_sp0, tss, thread); + PVOP_VCALL1(pv_cpu_ops.load_sp0, sp0); } /* The paravirtualized CPUID instruction. */ diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 10cc3b9709fe..6ec54d01972d 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -134,7 +134,7 @@ struct pv_cpu_ops { void (*alloc_ldt)(struct desc_struct *ldt, unsigned entries); void (*free_ldt)(struct desc_struct *ldt, unsigned entries); - void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t); + void (*load_sp0)(unsigned long sp0); void (*set_iopl_mask)(unsigned mask); diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index cdb4614f49fa..90f66e678294 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -518,9 +518,9 @@ static inline void native_set_iopl_mask(unsigned mask) } static inline void -native_load_sp0(struct tss_struct *tss, struct thread_struct *thread) +native_load_sp0(unsigned long sp0) { - tss->x86_tss.sp0 = thread->sp0; + this_cpu_write(cpu_tss.x86_tss.sp0, sp0); } static inline void native_swapgs(void) @@ -545,10 +545,9 @@ static inline unsigned long current_top_of_stack(void) #else #define __cpuid native_cpuid -static inline void load_sp0(struct tss_struct *tss, - struct thread_struct *thread) +static inline void load_sp0(unsigned long sp0) { - native_load_sp0(tss, thread); + native_load_sp0(sp0); } #define set_iopl_mask native_set_iopl_mask diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 03bb004bb15e..4e7fb9c3bfa5 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1570,7 +1570,7 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, me); - load_sp0(t, ¤t->thread); + load_sp0(current->thread.sp0); set_tss_desc(cpu, t); load_TR_desc(); load_mm_ldt(&init_mm); @@ -1625,7 +1625,7 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, curr); - load_sp0(t, thread); + load_sp0(thread->sp0); set_tss_desc(cpu, t); load_TR_desc(); load_mm_ldt(&init_mm); diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 0936ed3da6b6..40b85870e429 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -287,7 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) * current_thread_info(). Refresh the SYSENTER configuration in * case prev or next is vm86. */ - load_sp0(tss, next); + load_sp0(next->sp0); refresh_sysenter_cs(next); this_cpu_write(cpu_current_top_of_stack, (unsigned long)task_stack_page(next_p) + diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index a6ff6d1a0110..2124304fb77a 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) this_cpu_write(current_task, next_p); /* Reload sp0. */ - load_sp0(tss, next); + load_sp0(next->sp0); /* * Now maybe reload the debug registers and handle I/O bitmaps diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c index a59b9ce75efc..f75e305a9b63 100644 --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -95,7 +95,6 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval) { - struct tss_struct *tss; struct task_struct *tsk = current; struct vm86plus_struct __user *user; struct vm86 *vm86 = current->thread.vm86; @@ -147,13 +146,13 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval) do_exit(SIGSEGV); } - tss = &per_cpu(cpu_tss, get_cpu()); + preempt_disable(); tsk->thread.sp0 = vm86->saved_sp0; tsk->thread.sysenter_cs = __KERNEL_CS; - load_sp0(tss, &tsk->thread); + load_sp0(tsk->thread.sp0); refresh_sysenter_cs(&tsk->thread); vm86->saved_sp0 = 0; - put_cpu(); + preempt_enable(); memcpy(®s->pt, &vm86->regs32, sizeof(struct pt_regs)); @@ -239,7 +238,6 @@ SYSCALL_DEFINE2(vm86, unsigned long, cmd, unsigned long, arg) static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) { - struct tss_struct *tss; struct task_struct *tsk = current; struct vm86 *vm86 = tsk->thread.vm86; struct kernel_vm86_regs vm86regs; @@ -367,8 +365,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) vm86->saved_sp0 = tsk->thread.sp0; lazy_save_gs(vm86->regs32.gs); - tss = &per_cpu(cpu_tss, get_cpu()); /* make room for real-mode segments */ + preempt_disable(); tsk->thread.sp0 += 16; if (static_cpu_has(X86_FEATURE_SEP)) { @@ -376,8 +374,8 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) refresh_sysenter_cs(&tsk->thread); } - load_sp0(tss, &tsk->thread); - put_cpu(); + load_sp0(tsk->thread.sp0); + preempt_enable(); if (vm86->flags & VM86_SCREEN_BITMAP) mark_screen_rdonly(tsk->mm); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 60eb6dcacaac..e55d276afc70 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -811,15 +811,14 @@ static void __init xen_write_gdt_entry_boot(struct desc_struct *dt, int entry, } } -static void xen_load_sp0(struct tss_struct *tss, - struct thread_struct *thread) +static void xen_load_sp0(unsigned long sp0) { struct multicall_space mcs; mcs = xen_mc_entry(0); - MULTI_stack_switch(mcs.mc, __KERNEL_DS, thread->sp0); + MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0); xen_mc_issue(PARAVIRT_LAZY_CPU); - tss->x86_tss.sp0 = thread->sp0; + this_cpu_write(cpu_tss.x86_tss.sp0, sp0); } void xen_set_iopl_mask(unsigned mask) From f576136bc88175014ef3f57378d0d982a5c4e786 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:11 -0700 Subject: [PATCH 0602/1013] x86/entry: Add task_top_of_stack() to find the top of a task's stack commit 3500130b84a3cdc5b6796eba1daf178944935efe upstream. This will let us get rid of a few places that hardcode accesses to thread.sp0. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/b49b3f95a8ff858c40c9b0f5b32be0355324327d.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 90f66e678294..98278a238093 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -796,6 +796,8 @@ static inline void spin_lock_prefetch(const void *x) #define TOP_OF_INIT_STACK ((unsigned long)&init_stack + sizeof(init_stack) - \ TOP_OF_KERNEL_STACK_PADDING) +#define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1)) + #ifdef CONFIG_X86_32 /* * User space process size: 3GB (default). From 0917dd6e7a73c39d5bb1036e2e1fc73d5d56cb8c Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:12 -0700 Subject: [PATCH 0603/1013] x86/xen/64, x86/entry/64: Clean up SP code in cpu_initialize_context() commit f16b3da1dc936c0f8121741d0a1731bf242f2f56 upstream. I'm removing thread_struct::sp0, and Xen's usage of it is slightly dubious and unnecessary. Use appropriate helpers instead. While we're at at, reorder the code slightly to make it more obvious what's going on. Signed-off-by: Andy Lutomirski Reviewed-by: Juergen Gross Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/d5b9a3da2b47c68325bd2bbe8f82d9554dee0d0f.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/xen/smp_pv.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c index 05f91ce9b55e..c0c756c76afe 100644 --- a/arch/x86/xen/smp_pv.c +++ b/arch/x86/xen/smp_pv.c @@ -14,6 +14,7 @@ * single-threaded. */ #include +#include #include #include #include @@ -294,12 +295,19 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle) #endif memset(&ctxt->fpu_ctxt, 0, sizeof(ctxt->fpu_ctxt)); + /* + * Bring up the CPU in cpu_bringup_and_idle() with the stack + * pointing just below where pt_regs would be if it were a normal + * kernel entry. + */ ctxt->user_regs.eip = (unsigned long)cpu_bringup_and_idle; ctxt->flags = VGCF_IN_KERNEL; ctxt->user_regs.eflags = 0x1000; /* IOPL_RING1 */ ctxt->user_regs.ds = __USER_DS; ctxt->user_regs.es = __USER_DS; ctxt->user_regs.ss = __KERNEL_DS; + ctxt->user_regs.cs = __KERNEL_CS; + ctxt->user_regs.esp = (unsigned long)task_pt_regs(idle); xen_copy_trap_info(ctxt->trap_ctxt); @@ -314,8 +322,13 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle) ctxt->gdt_frames[0] = gdt_mfn; ctxt->gdt_ents = GDT_ENTRIES; + /* + * Set SS:SP that Xen will use when entering guest kernel mode + * from guest user mode. Subsequent calls to load_sp0() can + * change this value. + */ ctxt->kernel_ss = __KERNEL_DS; - ctxt->kernel_sp = idle->thread.sp0; + ctxt->kernel_sp = task_top_of_stack(idle); #ifdef CONFIG_X86_32 ctxt->event_callback_cs = __KERNEL_CS; @@ -327,10 +340,8 @@ cpu_initialize_context(unsigned int cpu, struct task_struct *idle) (unsigned long)xen_hypervisor_callback; ctxt->failsafe_callback_eip = (unsigned long)xen_failsafe_callback; - ctxt->user_regs.cs = __KERNEL_CS; per_cpu(xen_cr3, cpu) = __pa(swapper_pg_dir); - ctxt->user_regs.esp = idle->thread.sp0 - sizeof(struct pt_regs); ctxt->ctrlreg[3] = xen_pfn_to_cr3(virt_to_gfn(swapper_pg_dir)); if (HYPERVISOR_vcpu_op(VCPUOP_initialise, xen_vcpu_nr(cpu), ctxt)) BUG(); From 71d7244efb0c9af0443f7376c844f3785a33497d Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:13 -0700 Subject: [PATCH 0604/1013] x86/entry/64: Stop initializing TSS.sp0 at boot commit 20bb83443ea79087b5e5f8dab4e9d80bb9bf7acb upstream. In my quest to get rid of thread_struct::sp0, I want to clean up or remove all of its readers. Two of them are in cpu_init() (32-bit and 64-bit), and they aren't needed. This is because we never enter userspace at all on the threads that CPUs are initialized in. Poison the initial TSS.sp0 and stop initializing it on CPU init. The comment text mostly comes from Dave Hansen. Thanks! Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/ee4a00540ad28c6cff475fbcc7769a4460acc861.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/common.c | 13 ++++++++++--- arch/x86/kernel/process.c | 8 +++++++- 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 4e7fb9c3bfa5..cdf79ab628c2 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1570,9 +1570,13 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, me); - load_sp0(current->thread.sp0); + /* + * Initialize the TSS. Don't bother initializing sp0, as the initial + * task never enters user mode. + */ set_tss_desc(cpu, t); load_TR_desc(); + load_mm_ldt(&init_mm); clear_all_debug_regs(); @@ -1594,7 +1598,6 @@ void cpu_init(void) int cpu = smp_processor_id(); struct task_struct *curr = current; struct tss_struct *t = &per_cpu(cpu_tss, cpu); - struct thread_struct *thread = &curr->thread; wait_for_master_cpu(cpu); @@ -1625,9 +1628,13 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, curr); - load_sp0(thread->sp0); + /* + * Initialize the TSS. Don't bother initializing sp0, as the initial + * task never enters user mode. + */ set_tss_desc(cpu, t); load_TR_desc(); + load_mm_ldt(&init_mm); t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c67685337c5a..97fb3e5737f5 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -49,7 +49,13 @@ */ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { .x86_tss = { - .sp0 = TOP_OF_INIT_STACK, + /* + * .sp0 is only used when entering ring 0 from a lower + * privilege level. Since the init task never runs anything + * but ring 0 code, there is no need for a valid value here. + * Poison it. + */ + .sp0 = (1UL << (BITS_PER_LONG-1)) + 1, #ifdef CONFIG_X86_32 .ss0 = __KERNEL_DS, .ss1 = __KERNEL_CS, From c30eb760e3ecc7aebf37e557baecefb244733caa Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:14 -0700 Subject: [PATCH 0605/1013] x86/entry/64: Remove all remaining direct thread_struct::sp0 reads commit 46f5a10a721ce8dce8cc8fe55279b49e1c6b3288 upstream. The only remaining readers in context switch code or vm86(), and they all just want to update TSS.sp0 to match the current task. Replace them all with a new helper update_sp0(). Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/2d231687f4ff288c9d9e98d7861b7df374246ac3.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/switch_to.h | 6 ++++++ arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/vm86_32.c | 4 ++-- 4 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 0fe929d1ff70..92dca86a39e3 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -85,4 +85,10 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) } #endif +/* This is used when switching tasks or entering/exiting vm86 mode. */ +static inline void update_sp0(struct task_struct *task) +{ + load_sp0(task->thread.sp0); +} + #endif /* _ASM_X86_SWITCH_TO_H */ diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 40b85870e429..45bf0c5f93e1 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -287,7 +287,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) * current_thread_info(). Refresh the SYSENTER configuration in * case prev or next is vm86. */ - load_sp0(next->sp0); + update_sp0(next_p); refresh_sysenter_cs(next); this_cpu_write(cpu_current_top_of_stack, (unsigned long)task_stack_page(next_p) + diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 2124304fb77a..45e380958392 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -465,7 +465,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) this_cpu_write(current_task, next_p); /* Reload sp0. */ - load_sp0(next->sp0); + update_sp0(next_p); /* * Now maybe reload the debug registers and handle I/O bitmaps diff --git a/arch/x86/kernel/vm86_32.c b/arch/x86/kernel/vm86_32.c index f75e305a9b63..5edb27f1a2c4 100644 --- a/arch/x86/kernel/vm86_32.c +++ b/arch/x86/kernel/vm86_32.c @@ -149,7 +149,7 @@ void save_v86_state(struct kernel_vm86_regs *regs, int retval) preempt_disable(); tsk->thread.sp0 = vm86->saved_sp0; tsk->thread.sysenter_cs = __KERNEL_CS; - load_sp0(tsk->thread.sp0); + update_sp0(tsk); refresh_sysenter_cs(&tsk->thread); vm86->saved_sp0 = 0; preempt_enable(); @@ -374,7 +374,7 @@ static long do_sys_vm86(struct vm86plus_struct __user *user_vm86, bool plus) refresh_sysenter_cs(&tsk->thread); } - load_sp0(tsk->thread.sp0); + update_sp0(tsk); preempt_enable(); if (vm86->flags & VM86_SCREEN_BITMAP) From 266a0b19177e9ad7767aed90e14033dd46a8f000 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:15 -0700 Subject: [PATCH 0606/1013] x86/entry/32: Fix cpu_current_top_of_stack initialization at boot commit cd493a6deb8b78eca280d05f7fa73fd69403ae29 upstream. cpu_current_top_of_stack's initialization forgot about TOP_OF_KERNEL_STACK_PADDING. This bug didn't matter because the idle threads never enter user mode. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/e5e370a7e6e4fddd1c4e4cf619765d96bb874b21.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/smpboot.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 5e0453f18a57..142126ab5aae 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -962,8 +962,7 @@ void common_cpu_up(unsigned int cpu, struct task_struct *idle) #ifdef CONFIG_X86_32 /* Stack for startup_32 can be just as for start_secondary onwards */ irq_ctx_init(cpu); - per_cpu(cpu_current_top_of_stack, cpu) = - (unsigned long)task_stack_page(idle) + THREAD_SIZE; + per_cpu(cpu_current_top_of_stack, cpu) = task_top_of_stack(idle); #else initial_gs = per_cpu_offset(cpu); #endif From c6f563cd1393521c3bd74730cb972f4ec373694a Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:16 -0700 Subject: [PATCH 0607/1013] x86/entry/64: Remove thread_struct::sp0 commit d375cf1530595e33961a8844192cddab913650e3 upstream. On x86_64, we can easily calculate sp0 when needed instead of storing it in thread_struct. On x86_32, a similar cleanup would be possible, but it would require cleaning up the vm86 code first, and that can wait for a later cleanup series. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/719cd9c66c548c4350d98a90f050aee8b17f8919.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/compat.h | 1 + arch/x86/include/asm/processor.h | 28 +++++++++------------------- arch/x86/include/asm/switch_to.h | 6 ++++++ arch/x86/kernel/process_64.c | 1 - 4 files changed, 16 insertions(+), 20 deletions(-) diff --git a/arch/x86/include/asm/compat.h b/arch/x86/include/asm/compat.h index 70bc1df580b2..2cbd75dd2fd3 100644 --- a/arch/x86/include/asm/compat.h +++ b/arch/x86/include/asm/compat.h @@ -7,6 +7,7 @@ */ #include #include +#include #include #include #include diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 98278a238093..5c090050f75c 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -431,7 +431,9 @@ typedef struct { struct thread_struct { /* Cached TLS descriptors: */ struct desc_struct tls_array[GDT_ENTRY_TLS_ENTRIES]; +#ifdef CONFIG_X86_32 unsigned long sp0; +#endif unsigned long sp; #ifdef CONFIG_X86_32 unsigned long sysenter_cs; @@ -798,6 +800,13 @@ static inline void spin_lock_prefetch(const void *x) #define task_top_of_stack(task) ((unsigned long)(task_pt_regs(task) + 1)) +#define task_pt_regs(task) \ +({ \ + unsigned long __ptr = (unsigned long)task_stack_page(task); \ + __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \ + ((struct pt_regs *)__ptr) - 1; \ +}) + #ifdef CONFIG_X86_32 /* * User space process size: 3GB (default). @@ -817,23 +826,6 @@ static inline void spin_lock_prefetch(const void *x) .addr_limit = KERNEL_DS, \ } -/* - * TOP_OF_KERNEL_STACK_PADDING reserves 8 bytes on top of the ring0 stack. - * This is necessary to guarantee that the entire "struct pt_regs" - * is accessible even if the CPU haven't stored the SS/ESP registers - * on the stack (interrupt gate does not save these registers - * when switching to the same priv ring). - * Therefore beware: accessing the ss/esp fields of the - * "struct pt_regs" is possible, but they may contain the - * completely wrong values. - */ -#define task_pt_regs(task) \ -({ \ - unsigned long __ptr = (unsigned long)task_stack_page(task); \ - __ptr += THREAD_SIZE - TOP_OF_KERNEL_STACK_PADDING; \ - ((struct pt_regs *)__ptr) - 1; \ -}) - #define KSTK_ESP(task) (task_pt_regs(task)->sp) #else @@ -867,11 +859,9 @@ static inline void spin_lock_prefetch(const void *x) #define STACK_TOP_MAX TASK_SIZE_MAX #define INIT_THREAD { \ - .sp0 = TOP_OF_INIT_STACK, \ .addr_limit = KERNEL_DS, \ } -#define task_pt_regs(tsk) ((struct pt_regs *)(tsk)->thread.sp0 - 1) extern unsigned long KSTK_ESP(struct task_struct *task); #endif /* CONFIG_X86_64 */ diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 92dca86a39e3..8c6bd6863db9 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_SWITCH_TO_H #define _ASM_X86_SWITCH_TO_H +#include + struct task_struct; /* one of the stranger aspects of C forward declarations */ struct task_struct *__switch_to_asm(struct task_struct *prev, @@ -88,7 +90,11 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) /* This is used when switching tasks or entering/exiting vm86 mode. */ static inline void update_sp0(struct task_struct *task) { +#ifdef CONFIG_X86_32 load_sp0(task->thread.sp0); +#else + load_sp0(task_top_of_stack(task)); +#endif } #endif /* _ASM_X86_SWITCH_TO_H */ diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 45e380958392..eeeb34f85c25 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -274,7 +274,6 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp, struct inactive_task_frame *frame; struct task_struct *me = current; - p->thread.sp0 = (unsigned long)task_stack_page(p) + THREAD_SIZE; childregs = task_pt_regs(p); fork_frame = container_of(childregs, struct fork_frame, regs); frame = &fork_frame->frame; From 35c1d57e63914254d463a2361b6348a2b823c70d Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Thu, 2 Nov 2017 00:59:17 -0700 Subject: [PATCH 0608/1013] x86/traps: Use a new on_thread_stack() helper to clean up an assertion commit 3383642c2f9d4f5b4fa37436db4a109a1a10018c upstream. Let's keep the stack-related logic together rather than open-coding a comparison in an assertion in the traps code. Signed-off-by: Andy Lutomirski Reviewed-by: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/856b15bee1f55017b8f79d3758b0d51c48a08cf8.1509609304.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 6 ++++++ arch/x86/kernel/traps.c | 3 +-- 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 5c090050f75c..2db7cf720b04 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -542,6 +542,12 @@ static inline unsigned long current_top_of_stack(void) #endif } +static inline bool on_thread_stack(void) +{ + return (unsigned long)(current_top_of_stack() - + current_stack_pointer) < THREAD_SIZE; +} + #ifdef CONFIG_PARAVIRT #include #else diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 5a6b8f809792..d366adfc61da 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -141,8 +141,7 @@ void ist_begin_non_atomic(struct pt_regs *regs) * will catch asm bugs and any attempt to use ist_preempt_enable * from double_fault. */ - BUG_ON((unsigned long)(current_top_of_stack() - - current_stack_pointer) >= THREAD_SIZE); + BUG_ON(!on_thread_stack()); preempt_enable_no_resched(); } From b36c2c3ab339f50fb60f095ee508fc3376bdb1d6 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Thu, 2 Nov 2017 13:09:26 +0100 Subject: [PATCH 0609/1013] x86/entry/64: Shorten TEST instructions commit 1e4c4f610f774df6088d7c065b2dd4d22adba698 upstream. Convert TESTL to TESTB and save 3 bytes per callsite. No functionality change. Signed-off-by: Borislav Petkov Cc: Andy Lutomirski Cc: Brian Gerst Cc: Dave Hansen Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20171102120926.4srwerqrr7g72e2k@pd.tnic Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index a9c248b44305..5063ed1214dd 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -621,7 +621,7 @@ GLOBAL(retint_user) GLOBAL(swapgs_restore_regs_and_return_to_usermode) #ifdef CONFIG_DEBUG_ENTRY /* Assert that pt_regs indicates user mode. */ - testl $3, CS(%rsp) + testb $3, CS(%rsp) jnz 1f ud2 1: @@ -654,7 +654,7 @@ retint_kernel: GLOBAL(restore_regs_and_return_to_kernel) #ifdef CONFIG_DEBUG_ENTRY /* Assert that pt_regs indicates kernel mode. */ - testl $3, CS(%rsp) + testb $3, CS(%rsp) jz 1f ud2 1: From 3243ae92926c4ad3a7f9596952542c6803e69050 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 2 Nov 2017 13:22:35 +0100 Subject: [PATCH 0610/1013] x86/cpuid: Replace set/clear_bit32() commit 06dd688ddda5819025e014b79aea9af6ab475fa2 upstream. Peter pointed out that the set/clear_bit32() variants are broken in various aspects. Replace them with open coded set/clear_bit() and type cast cpu_info::x86_capability as it's done in all other places throughout x86. Fixes: 0b00de857a64 ("x86/cpuid: Add generic table for CPUID dependencies") Reported-by: Peter Ziljstra Signed-off-by: Thomas Gleixner Cc: Andi Kleen Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/cpuid-deps.c | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c index c21f22d836ad..904b0a3c4e53 100644 --- a/arch/x86/kernel/cpu/cpuid-deps.c +++ b/arch/x86/kernel/cpu/cpuid-deps.c @@ -62,23 +62,19 @@ const static struct cpuid_dep cpuid_deps[] = { {} }; -static inline void __clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit) -{ - clear_bit32(bit, c->x86_capability); -} - -static inline void __setup_clear_cpu_cap(unsigned int bit) -{ - clear_cpu_cap(&boot_cpu_data, bit); - set_bit32(bit, cpu_caps_cleared); -} - static inline void clear_feature(struct cpuinfo_x86 *c, unsigned int feature) { - if (!c) - __setup_clear_cpu_cap(feature); - else - __clear_cpu_cap(c, feature); + /* + * Note: This could use the non atomic __*_bit() variants, but the + * rest of the cpufeature code uses atomics as well, so keep it for + * consistency. Cleanup all of it separately. + */ + if (!c) { + clear_cpu_cap(&boot_cpu_data, feature); + set_bit(feature, (unsigned long *)cpu_caps_cleared); + } else { + clear_bit(feature, (unsigned long *)c->x86_capability); + } } /* Take the capabilities and the BUG bits into account */ From c1ffb6aefbc5aa6472e172d9c24c0e7a414a1e1f Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 2 Nov 2017 13:30:03 +0100 Subject: [PATCH 0611/1013] bitops: Revert cbe96375025e ("bitops: Add clear/set_bit32() to linux/bitops.h") commit 1943dc07b45e347c52c1bfdd4a37e04a86e399aa upstream. These ops are not endian safe and may break on architectures which have aligment requirements. Reverts: cbe96375025e ("bitops: Add clear/set_bit32() to linux/bitops.h") Reported-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Cc: Andi Kleen Signed-off-by: Greg Kroah-Hartman --- include/linux/bitops.h | 26 -------------------------- 1 file changed, 26 deletions(-) diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 8a7e9924df57..d03c5dd6185d 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -228,32 +228,6 @@ static inline unsigned long __ffs64(u64 word) return __ffs((unsigned long)word); } -/* - * clear_bit32 - Clear a bit in memory for u32 array - * @nr: Bit to clear - * @addr: u32 * address of bitmap - * - * Same as clear_bit, but avoids needing casts for u32 arrays. - */ - -static __always_inline void clear_bit32(long nr, volatile u32 *addr) -{ - clear_bit(nr, (volatile unsigned long *)addr); -} - -/* - * set_bit32 - Set a bit in memory for u32 array - * @nr: Bit to clear - * @addr: u32 * address of bitmap - * - * Same as set_bit, but avoids needing casts for u32 arrays. - */ - -static __always_inline void set_bit32(long nr, volatile u32 *addr) -{ - set_bit(nr, (volatile unsigned long *)addr); -} - #ifdef __KERNEL__ #ifndef set_mask_bits From 64766453be2ee380ba7f3620bfeadcee2787145f Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 3 Nov 2017 11:20:28 +0100 Subject: [PATCH 0612/1013] x86/mm: Define _PAGE_TABLE using _KERNPG_TABLE commit c7da092a1f243bfd1bfb4124f538e69e941882da upstream. ... so that the difference is obvious. No functionality change. Signed-off-by: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20171103102028.20284-1-bp@alien8.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable_types.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 59df7b47a434..9e9b05fc4860 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -200,10 +200,9 @@ enum page_cache_mode { #define _PAGE_ENC (_AT(pteval_t, sme_me_mask)) -#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \ - _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_ENC) #define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \ _PAGE_DIRTY | _PAGE_ENC) +#define _PAGE_TABLE (_KERNPG_TABLE | _PAGE_USER) #define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC) #define __PAGE_KERNEL_ENC_WP (__PAGE_KERNEL_WP | _PAGE_ENC) From 47af9e68f3f2e85b043e2c67e90d5aa90220a1d4 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Tue, 31 Oct 2017 13:17:22 +0100 Subject: [PATCH 0613/1013] x86/cpufeatures: Re-tabulate the X86_FEATURE definitions commit acbc845ffefd9fb70466182cd8555a26189462b2 upstream. Over the years asm/cpufeatures.h has become somewhat of a mess: the original tabulation style was too narrow, while x86 feature names also kept growing in length, creating frequent field width overflows. Re-tabulate it to make it wider and easier to read/modify. Also harmonize the tabulation of the other defines in this file to match it. Cc: Andrew Morton Cc: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20171031121723.28524-3-mingo@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 508 ++++++++++++++--------------- 1 file changed, 254 insertions(+), 254 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 74370734663c..ad1b835001cc 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -13,8 +13,8 @@ /* * Defines x86 CPU feature bits */ -#define NCAPINTS 18 /* N 32-bit words worth of info */ -#define NBUGINTS 1 /* N 32-bit bug flags */ +#define NCAPINTS 18 /* N 32-bit words worth of info */ +#define NBUGINTS 1 /* N 32-bit bug flags */ /* * Note: If the comment begins with a quoted string, that string is used @@ -28,163 +28,163 @@ */ /* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */ -#define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ -#define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ -#define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ -#define X86_FEATURE_PSE ( 0*32+ 3) /* Page Size Extensions */ -#define X86_FEATURE_TSC ( 0*32+ 4) /* Time Stamp Counter */ -#define X86_FEATURE_MSR ( 0*32+ 5) /* Model-Specific Registers */ -#define X86_FEATURE_PAE ( 0*32+ 6) /* Physical Address Extensions */ -#define X86_FEATURE_MCE ( 0*32+ 7) /* Machine Check Exception */ -#define X86_FEATURE_CX8 ( 0*32+ 8) /* CMPXCHG8 instruction */ -#define X86_FEATURE_APIC ( 0*32+ 9) /* Onboard APIC */ -#define X86_FEATURE_SEP ( 0*32+11) /* SYSENTER/SYSEXIT */ -#define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ -#define X86_FEATURE_PGE ( 0*32+13) /* Page Global Enable */ -#define X86_FEATURE_MCA ( 0*32+14) /* Machine Check Architecture */ -#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions */ +#define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ +#define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ +#define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ +#define X86_FEATURE_PSE ( 0*32+ 3) /* Page Size Extensions */ +#define X86_FEATURE_TSC ( 0*32+ 4) /* Time Stamp Counter */ +#define X86_FEATURE_MSR ( 0*32+ 5) /* Model-Specific Registers */ +#define X86_FEATURE_PAE ( 0*32+ 6) /* Physical Address Extensions */ +#define X86_FEATURE_MCE ( 0*32+ 7) /* Machine Check Exception */ +#define X86_FEATURE_CX8 ( 0*32+ 8) /* CMPXCHG8 instruction */ +#define X86_FEATURE_APIC ( 0*32+ 9) /* Onboard APIC */ +#define X86_FEATURE_SEP ( 0*32+11) /* SYSENTER/SYSEXIT */ +#define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ +#define X86_FEATURE_PGE ( 0*32+13) /* Page Global Enable */ +#define X86_FEATURE_MCA ( 0*32+14) /* Machine Check Architecture */ +#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions */ /* (plus FCMOVcc, FCOMI with FPU) */ -#define X86_FEATURE_PAT ( 0*32+16) /* Page Attribute Table */ -#define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ -#define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ -#define X86_FEATURE_CLFLUSH ( 0*32+19) /* CLFLUSH instruction */ -#define X86_FEATURE_DS ( 0*32+21) /* "dts" Debug Store */ -#define X86_FEATURE_ACPI ( 0*32+22) /* ACPI via MSR */ -#define X86_FEATURE_MMX ( 0*32+23) /* Multimedia Extensions */ -#define X86_FEATURE_FXSR ( 0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */ -#define X86_FEATURE_XMM ( 0*32+25) /* "sse" */ -#define X86_FEATURE_XMM2 ( 0*32+26) /* "sse2" */ -#define X86_FEATURE_SELFSNOOP ( 0*32+27) /* "ss" CPU self snoop */ -#define X86_FEATURE_HT ( 0*32+28) /* Hyper-Threading */ -#define X86_FEATURE_ACC ( 0*32+29) /* "tm" Automatic clock control */ -#define X86_FEATURE_IA64 ( 0*32+30) /* IA-64 processor */ -#define X86_FEATURE_PBE ( 0*32+31) /* Pending Break Enable */ +#define X86_FEATURE_PAT ( 0*32+16) /* Page Attribute Table */ +#define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ +#define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ +#define X86_FEATURE_CLFLUSH ( 0*32+19) /* CLFLUSH instruction */ +#define X86_FEATURE_DS ( 0*32+21) /* "dts" Debug Store */ +#define X86_FEATURE_ACPI ( 0*32+22) /* ACPI via MSR */ +#define X86_FEATURE_MMX ( 0*32+23) /* Multimedia Extensions */ +#define X86_FEATURE_FXSR ( 0*32+24) /* FXSAVE/FXRSTOR, CR4.OSFXSR */ +#define X86_FEATURE_XMM ( 0*32+25) /* "sse" */ +#define X86_FEATURE_XMM2 ( 0*32+26) /* "sse2" */ +#define X86_FEATURE_SELFSNOOP ( 0*32+27) /* "ss" CPU self snoop */ +#define X86_FEATURE_HT ( 0*32+28) /* Hyper-Threading */ +#define X86_FEATURE_ACC ( 0*32+29) /* "tm" Automatic clock control */ +#define X86_FEATURE_IA64 ( 0*32+30) /* IA-64 processor */ +#define X86_FEATURE_PBE ( 0*32+31) /* Pending Break Enable */ /* AMD-defined CPU features, CPUID level 0x80000001, word 1 */ /* Don't duplicate feature flags which are redundant with Intel! */ -#define X86_FEATURE_SYSCALL ( 1*32+11) /* SYSCALL/SYSRET */ -#define X86_FEATURE_MP ( 1*32+19) /* MP Capable. */ -#define X86_FEATURE_NX ( 1*32+20) /* Execute Disable */ -#define X86_FEATURE_MMXEXT ( 1*32+22) /* AMD MMX extensions */ -#define X86_FEATURE_FXSR_OPT ( 1*32+25) /* FXSAVE/FXRSTOR optimizations */ -#define X86_FEATURE_GBPAGES ( 1*32+26) /* "pdpe1gb" GB pages */ -#define X86_FEATURE_RDTSCP ( 1*32+27) /* RDTSCP */ -#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64) */ -#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow! extensions */ -#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow! */ +#define X86_FEATURE_SYSCALL ( 1*32+11) /* SYSCALL/SYSRET */ +#define X86_FEATURE_MP ( 1*32+19) /* MP Capable. */ +#define X86_FEATURE_NX ( 1*32+20) /* Execute Disable */ +#define X86_FEATURE_MMXEXT ( 1*32+22) /* AMD MMX extensions */ +#define X86_FEATURE_FXSR_OPT ( 1*32+25) /* FXSAVE/FXRSTOR optimizations */ +#define X86_FEATURE_GBPAGES ( 1*32+26) /* "pdpe1gb" GB pages */ +#define X86_FEATURE_RDTSCP ( 1*32+27) /* RDTSCP */ +#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64) */ +#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow! extensions */ +#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow! */ /* Transmeta-defined CPU features, CPUID level 0x80860001, word 2 */ -#define X86_FEATURE_RECOVERY ( 2*32+ 0) /* CPU in recovery mode */ -#define X86_FEATURE_LONGRUN ( 2*32+ 1) /* Longrun power control */ -#define X86_FEATURE_LRTI ( 2*32+ 3) /* LongRun table interface */ +#define X86_FEATURE_RECOVERY ( 2*32+ 0) /* CPU in recovery mode */ +#define X86_FEATURE_LONGRUN ( 2*32+ 1) /* Longrun power control */ +#define X86_FEATURE_LRTI ( 2*32+ 3) /* LongRun table interface */ /* Other features, Linux-defined mapping, word 3 */ /* This range is used for feature bits which conflict or are synthesized */ -#define X86_FEATURE_CXMMX ( 3*32+ 0) /* Cyrix MMX extensions */ -#define X86_FEATURE_K6_MTRR ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */ -#define X86_FEATURE_CYRIX_ARR ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */ -#define X86_FEATURE_CENTAUR_MCR ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */ +#define X86_FEATURE_CXMMX ( 3*32+ 0) /* Cyrix MMX extensions */ +#define X86_FEATURE_K6_MTRR ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */ +#define X86_FEATURE_CYRIX_ARR ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */ +#define X86_FEATURE_CENTAUR_MCR ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */ /* cpu types for specific tunings: */ -#define X86_FEATURE_K8 ( 3*32+ 4) /* "" Opteron, Athlon64 */ -#define X86_FEATURE_K7 ( 3*32+ 5) /* "" Athlon */ -#define X86_FEATURE_P3 ( 3*32+ 6) /* "" P3 */ -#define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ -#define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ -#define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */ -#define X86_FEATURE_ART ( 3*32+10) /* Platform has always running timer (ART) */ -#define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ -#define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ -#define X86_FEATURE_BTS ( 3*32+13) /* Branch Trace Store */ -#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in ia32 userspace */ -#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in ia32 userspace */ -#define X86_FEATURE_REP_GOOD ( 3*32+16) /* rep microcode works well */ -#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" Mfence synchronizes RDTSC */ -#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" Lfence synchronizes RDTSC */ -#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ -#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ -#define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ -#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* cpu topology enum extensions */ -#define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */ -#define X86_FEATURE_NONSTOP_TSC ( 3*32+24) /* TSC does not stop in C states */ -#define X86_FEATURE_CPUID ( 3*32+25) /* CPU has CPUID instruction itself */ -#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ -#define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ -#define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ -#define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ -#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */ +#define X86_FEATURE_K8 ( 3*32+ 4) /* "" Opteron, Athlon64 */ +#define X86_FEATURE_K7 ( 3*32+ 5) /* "" Athlon */ +#define X86_FEATURE_P3 ( 3*32+ 6) /* "" P3 */ +#define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ +#define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ +#define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */ +#define X86_FEATURE_ART ( 3*32+10) /* Platform has always running timer (ART) */ +#define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ +#define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ +#define X86_FEATURE_BTS ( 3*32+13) /* Branch Trace Store */ +#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in ia32 userspace */ +#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in ia32 userspace */ +#define X86_FEATURE_REP_GOOD ( 3*32+16) /* rep microcode works well */ +#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" Mfence synchronizes RDTSC */ +#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" Lfence synchronizes RDTSC */ +#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ +#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ +#define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ +#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* cpu topology enum extensions */ +#define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */ +#define X86_FEATURE_NONSTOP_TSC ( 3*32+24) /* TSC does not stop in C states */ +#define X86_FEATURE_CPUID ( 3*32+25) /* CPU has CPUID instruction itself */ +#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ +#define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ +#define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ +#define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ +#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */ /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ -#define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ -#define X86_FEATURE_PCLMULQDQ ( 4*32+ 1) /* PCLMULQDQ instruction */ -#define X86_FEATURE_DTES64 ( 4*32+ 2) /* 64-bit Debug Store */ -#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" Monitor/Mwait support */ -#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL Qual. Debug Store */ -#define X86_FEATURE_VMX ( 4*32+ 5) /* Hardware virtualization */ -#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer mode */ -#define X86_FEATURE_EST ( 4*32+ 7) /* Enhanced SpeedStep */ -#define X86_FEATURE_TM2 ( 4*32+ 8) /* Thermal Monitor 2 */ -#define X86_FEATURE_SSSE3 ( 4*32+ 9) /* Supplemental SSE-3 */ -#define X86_FEATURE_CID ( 4*32+10) /* Context ID */ -#define X86_FEATURE_SDBG ( 4*32+11) /* Silicon Debug */ -#define X86_FEATURE_FMA ( 4*32+12) /* Fused multiply-add */ -#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B */ -#define X86_FEATURE_XTPR ( 4*32+14) /* Send Task Priority Messages */ -#define X86_FEATURE_PDCM ( 4*32+15) /* Performance Capabilities */ -#define X86_FEATURE_PCID ( 4*32+17) /* Process Context Identifiers */ -#define X86_FEATURE_DCA ( 4*32+18) /* Direct Cache Access */ -#define X86_FEATURE_XMM4_1 ( 4*32+19) /* "sse4_1" SSE-4.1 */ -#define X86_FEATURE_XMM4_2 ( 4*32+20) /* "sse4_2" SSE-4.2 */ -#define X86_FEATURE_X2APIC ( 4*32+21) /* x2APIC */ -#define X86_FEATURE_MOVBE ( 4*32+22) /* MOVBE instruction */ -#define X86_FEATURE_POPCNT ( 4*32+23) /* POPCNT instruction */ +#define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ +#define X86_FEATURE_PCLMULQDQ ( 4*32+ 1) /* PCLMULQDQ instruction */ +#define X86_FEATURE_DTES64 ( 4*32+ 2) /* 64-bit Debug Store */ +#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" Monitor/Mwait support */ +#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL Qual. Debug Store */ +#define X86_FEATURE_VMX ( 4*32+ 5) /* Hardware virtualization */ +#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer mode */ +#define X86_FEATURE_EST ( 4*32+ 7) /* Enhanced SpeedStep */ +#define X86_FEATURE_TM2 ( 4*32+ 8) /* Thermal Monitor 2 */ +#define X86_FEATURE_SSSE3 ( 4*32+ 9) /* Supplemental SSE-3 */ +#define X86_FEATURE_CID ( 4*32+10) /* Context ID */ +#define X86_FEATURE_SDBG ( 4*32+11) /* Silicon Debug */ +#define X86_FEATURE_FMA ( 4*32+12) /* Fused multiply-add */ +#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B */ +#define X86_FEATURE_XTPR ( 4*32+14) /* Send Task Priority Messages */ +#define X86_FEATURE_PDCM ( 4*32+15) /* Performance Capabilities */ +#define X86_FEATURE_PCID ( 4*32+17) /* Process Context Identifiers */ +#define X86_FEATURE_DCA ( 4*32+18) /* Direct Cache Access */ +#define X86_FEATURE_XMM4_1 ( 4*32+19) /* "sse4_1" SSE-4.1 */ +#define X86_FEATURE_XMM4_2 ( 4*32+20) /* "sse4_2" SSE-4.2 */ +#define X86_FEATURE_X2APIC ( 4*32+21) /* x2APIC */ +#define X86_FEATURE_MOVBE ( 4*32+22) /* MOVBE instruction */ +#define X86_FEATURE_POPCNT ( 4*32+23) /* POPCNT instruction */ #define X86_FEATURE_TSC_DEADLINE_TIMER ( 4*32+24) /* Tsc deadline timer */ -#define X86_FEATURE_AES ( 4*32+25) /* AES instructions */ -#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */ -#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE enabled in the OS */ -#define X86_FEATURE_AVX ( 4*32+28) /* Advanced Vector Extensions */ -#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit fp conversions */ -#define X86_FEATURE_RDRAND ( 4*32+30) /* The RDRAND instruction */ -#define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */ +#define X86_FEATURE_AES ( 4*32+25) /* AES instructions */ +#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */ +#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE enabled in the OS */ +#define X86_FEATURE_AVX ( 4*32+28) /* Advanced Vector Extensions */ +#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit fp conversions */ +#define X86_FEATURE_RDRAND ( 4*32+30) /* The RDRAND instruction */ +#define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */ /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */ -#define X86_FEATURE_XSTORE ( 5*32+ 2) /* "rng" RNG present (xstore) */ -#define X86_FEATURE_XSTORE_EN ( 5*32+ 3) /* "rng_en" RNG enabled */ -#define X86_FEATURE_XCRYPT ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */ -#define X86_FEATURE_XCRYPT_EN ( 5*32+ 7) /* "ace_en" on-CPU crypto enabled */ -#define X86_FEATURE_ACE2 ( 5*32+ 8) /* Advanced Cryptography Engine v2 */ -#define X86_FEATURE_ACE2_EN ( 5*32+ 9) /* ACE v2 enabled */ -#define X86_FEATURE_PHE ( 5*32+10) /* PadLock Hash Engine */ -#define X86_FEATURE_PHE_EN ( 5*32+11) /* PHE enabled */ -#define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */ -#define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */ +#define X86_FEATURE_XSTORE ( 5*32+ 2) /* "rng" RNG present (xstore) */ +#define X86_FEATURE_XSTORE_EN ( 5*32+ 3) /* "rng_en" RNG enabled */ +#define X86_FEATURE_XCRYPT ( 5*32+ 6) /* "ace" on-CPU crypto (xcrypt) */ +#define X86_FEATURE_XCRYPT_EN ( 5*32+ 7) /* "ace_en" on-CPU crypto enabled */ +#define X86_FEATURE_ACE2 ( 5*32+ 8) /* Advanced Cryptography Engine v2 */ +#define X86_FEATURE_ACE2_EN ( 5*32+ 9) /* ACE v2 enabled */ +#define X86_FEATURE_PHE ( 5*32+10) /* PadLock Hash Engine */ +#define X86_FEATURE_PHE_EN ( 5*32+11) /* PHE enabled */ +#define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */ +#define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */ /* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */ -#define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */ -#define X86_FEATURE_CMP_LEGACY ( 6*32+ 1) /* If yes HyperThreading not valid */ -#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure virtual machine */ -#define X86_FEATURE_EXTAPIC ( 6*32+ 3) /* Extended APIC space */ -#define X86_FEATURE_CR8_LEGACY ( 6*32+ 4) /* CR8 in 32-bit mode */ -#define X86_FEATURE_ABM ( 6*32+ 5) /* Advanced bit manipulation */ -#define X86_FEATURE_SSE4A ( 6*32+ 6) /* SSE-4A */ -#define X86_FEATURE_MISALIGNSSE ( 6*32+ 7) /* Misaligned SSE mode */ -#define X86_FEATURE_3DNOWPREFETCH ( 6*32+ 8) /* 3DNow prefetch instructions */ -#define X86_FEATURE_OSVW ( 6*32+ 9) /* OS Visible Workaround */ -#define X86_FEATURE_IBS ( 6*32+10) /* Instruction Based Sampling */ -#define X86_FEATURE_XOP ( 6*32+11) /* extended AVX instructions */ -#define X86_FEATURE_SKINIT ( 6*32+12) /* SKINIT/STGI instructions */ -#define X86_FEATURE_WDT ( 6*32+13) /* Watchdog timer */ -#define X86_FEATURE_LWP ( 6*32+15) /* Light Weight Profiling */ -#define X86_FEATURE_FMA4 ( 6*32+16) /* 4 operands MAC instructions */ -#define X86_FEATURE_TCE ( 6*32+17) /* translation cache extension */ -#define X86_FEATURE_NODEID_MSR ( 6*32+19) /* NodeId MSR */ -#define X86_FEATURE_TBM ( 6*32+21) /* trailing bit manipulations */ -#define X86_FEATURE_TOPOEXT ( 6*32+22) /* topology extensions CPUID leafs */ -#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* core performance counter extensions */ -#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */ -#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */ -#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */ -#define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */ -#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */ +#define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */ +#define X86_FEATURE_CMP_LEGACY ( 6*32+ 1) /* If yes HyperThreading not valid */ +#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure virtual machine */ +#define X86_FEATURE_EXTAPIC ( 6*32+ 3) /* Extended APIC space */ +#define X86_FEATURE_CR8_LEGACY ( 6*32+ 4) /* CR8 in 32-bit mode */ +#define X86_FEATURE_ABM ( 6*32+ 5) /* Advanced bit manipulation */ +#define X86_FEATURE_SSE4A ( 6*32+ 6) /* SSE-4A */ +#define X86_FEATURE_MISALIGNSSE ( 6*32+ 7) /* Misaligned SSE mode */ +#define X86_FEATURE_3DNOWPREFETCH ( 6*32+ 8) /* 3DNow prefetch instructions */ +#define X86_FEATURE_OSVW ( 6*32+ 9) /* OS Visible Workaround */ +#define X86_FEATURE_IBS ( 6*32+10) /* Instruction Based Sampling */ +#define X86_FEATURE_XOP ( 6*32+11) /* extended AVX instructions */ +#define X86_FEATURE_SKINIT ( 6*32+12) /* SKINIT/STGI instructions */ +#define X86_FEATURE_WDT ( 6*32+13) /* Watchdog timer */ +#define X86_FEATURE_LWP ( 6*32+15) /* Light Weight Profiling */ +#define X86_FEATURE_FMA4 ( 6*32+16) /* 4 operands MAC instructions */ +#define X86_FEATURE_TCE ( 6*32+17) /* translation cache extension */ +#define X86_FEATURE_NODEID_MSR ( 6*32+19) /* NodeId MSR */ +#define X86_FEATURE_TBM ( 6*32+21) /* trailing bit manipulations */ +#define X86_FEATURE_TOPOEXT ( 6*32+22) /* topology extensions CPUID leafs */ +#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* core performance counter extensions */ +#define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */ +#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */ +#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */ +#define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */ +#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */ /* * Auxiliary flags: Linux defined - For features scattered in various @@ -192,152 +192,152 @@ * * Reuse free bits when adding new feature flags! */ -#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT */ -#define X86_FEATURE_CPUID_FAULT ( 7*32+ 1) /* Intel CPUID faulting */ -#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ -#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ -#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ -#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ -#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ +#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT */ +#define X86_FEATURE_CPUID_FAULT ( 7*32+ 1) /* Intel CPUID faulting */ +#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ +#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ +#define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ +#define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ +#define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ -#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ -#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ -#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ +#define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ +#define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ +#define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ -#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ -#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ -#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ -#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ +#define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ +#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ +#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ -#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ +#define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ /* Virtualization flags: Linux defined, word 8 */ -#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ -#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */ -#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */ -#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ -#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */ +#define X86_FEATURE_TPR_SHADOW ( 8*32+ 0) /* Intel TPR Shadow */ +#define X86_FEATURE_VNMI ( 8*32+ 1) /* Intel Virtual NMI */ +#define X86_FEATURE_FLEXPRIORITY ( 8*32+ 2) /* Intel FlexPriority */ +#define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ +#define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */ -#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ -#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ +#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ +#define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */ -#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ -#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3b */ -#define X86_FEATURE_BMI1 ( 9*32+ 3) /* 1st group bit manipulation extensions */ -#define X86_FEATURE_HLE ( 9*32+ 4) /* Hardware Lock Elision */ -#define X86_FEATURE_AVX2 ( 9*32+ 5) /* AVX2 instructions */ -#define X86_FEATURE_SMEP ( 9*32+ 7) /* Supervisor Mode Execution Protection */ -#define X86_FEATURE_BMI2 ( 9*32+ 8) /* 2nd group bit manipulation extensions */ -#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB */ -#define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */ -#define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ -#define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ -#define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */ -#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ -#define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ -#define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ -#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ -#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */ -#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ -#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ -#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ -#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */ -#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */ -#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */ -#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */ -#define X86_FEATURE_SHA_NI ( 9*32+29) /* SHA1/SHA256 Instruction Extensions */ -#define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ -#define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */ +#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ +#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3b */ +#define X86_FEATURE_BMI1 ( 9*32+ 3) /* 1st group bit manipulation extensions */ +#define X86_FEATURE_HLE ( 9*32+ 4) /* Hardware Lock Elision */ +#define X86_FEATURE_AVX2 ( 9*32+ 5) /* AVX2 instructions */ +#define X86_FEATURE_SMEP ( 9*32+ 7) /* Supervisor Mode Execution Protection */ +#define X86_FEATURE_BMI2 ( 9*32+ 8) /* 2nd group bit manipulation extensions */ +#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB */ +#define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */ +#define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ +#define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ +#define X86_FEATURE_MPX ( 9*32+14) /* Memory Protection Extension */ +#define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ +#define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ +#define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ +#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ +#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */ +#define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ +#define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ +#define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ +#define X86_FEATURE_CLWB ( 9*32+24) /* CLWB instruction */ +#define X86_FEATURE_AVX512PF ( 9*32+26) /* AVX-512 Prefetch */ +#define X86_FEATURE_AVX512ER ( 9*32+27) /* AVX-512 Exponential and Reciprocal */ +#define X86_FEATURE_AVX512CD ( 9*32+28) /* AVX-512 Conflict Detection */ +#define X86_FEATURE_SHA_NI ( 9*32+29) /* SHA1/SHA256 Instruction Extensions */ +#define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ +#define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */ /* Extended state features, CPUID level 0x0000000d:1 (eax), word 10 */ -#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ -#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */ -#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 */ -#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */ +#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ +#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */ +#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 */ +#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */ /* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (edx), word 11 */ -#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */ +#define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */ /* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (edx), word 12 */ -#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */ -#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ -#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */ +#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */ +#define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ +#define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */ /* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */ -#define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ -#define X86_FEATURE_IRPERF (13*32+1) /* Instructions Retired Count */ +#define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ +#define X86_FEATURE_IRPERF (13*32+1) /* Instructions Retired Count */ /* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */ -#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ -#define X86_FEATURE_IDA (14*32+ 1) /* Intel Dynamic Acceleration */ -#define X86_FEATURE_ARAT (14*32+ 2) /* Always Running APIC Timer */ -#define X86_FEATURE_PLN (14*32+ 4) /* Intel Power Limit Notification */ -#define X86_FEATURE_PTS (14*32+ 6) /* Intel Package Thermal Status */ -#define X86_FEATURE_HWP (14*32+ 7) /* Intel Hardware P-states */ -#define X86_FEATURE_HWP_NOTIFY (14*32+ 8) /* HWP Notification */ -#define X86_FEATURE_HWP_ACT_WINDOW (14*32+ 9) /* HWP Activity Window */ -#define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ -#define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */ +#define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ +#define X86_FEATURE_IDA (14*32+ 1) /* Intel Dynamic Acceleration */ +#define X86_FEATURE_ARAT (14*32+ 2) /* Always Running APIC Timer */ +#define X86_FEATURE_PLN (14*32+ 4) /* Intel Power Limit Notification */ +#define X86_FEATURE_PTS (14*32+ 6) /* Intel Package Thermal Status */ +#define X86_FEATURE_HWP (14*32+ 7) /* Intel Hardware P-states */ +#define X86_FEATURE_HWP_NOTIFY (14*32+ 8) /* HWP Notification */ +#define X86_FEATURE_HWP_ACT_WINDOW (14*32+ 9) /* HWP Activity Window */ +#define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ +#define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */ /* AMD SVM Feature Identification, CPUID level 0x8000000a (edx), word 15 */ -#define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ -#define X86_FEATURE_LBRV (15*32+ 1) /* LBR Virtualization support */ -#define X86_FEATURE_SVML (15*32+ 2) /* "svm_lock" SVM locking MSR */ -#define X86_FEATURE_NRIPS (15*32+ 3) /* "nrip_save" SVM next_rip save */ -#define X86_FEATURE_TSCRATEMSR (15*32+ 4) /* "tsc_scale" TSC scaling support */ -#define X86_FEATURE_VMCBCLEAN (15*32+ 5) /* "vmcb_clean" VMCB clean bits support */ -#define X86_FEATURE_FLUSHBYASID (15*32+ 6) /* flush-by-ASID support */ -#define X86_FEATURE_DECODEASSISTS (15*32+ 7) /* Decode Assists support */ -#define X86_FEATURE_PAUSEFILTER (15*32+10) /* filtered pause intercept */ -#define X86_FEATURE_PFTHRESHOLD (15*32+12) /* pause filter threshold */ -#define X86_FEATURE_AVIC (15*32+13) /* Virtual Interrupt Controller */ -#define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */ -#define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */ +#define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ +#define X86_FEATURE_LBRV (15*32+ 1) /* LBR Virtualization support */ +#define X86_FEATURE_SVML (15*32+ 2) /* "svm_lock" SVM locking MSR */ +#define X86_FEATURE_NRIPS (15*32+ 3) /* "nrip_save" SVM next_rip save */ +#define X86_FEATURE_TSCRATEMSR (15*32+ 4) /* "tsc_scale" TSC scaling support */ +#define X86_FEATURE_VMCBCLEAN (15*32+ 5) /* "vmcb_clean" VMCB clean bits support */ +#define X86_FEATURE_FLUSHBYASID (15*32+ 6) /* flush-by-ASID support */ +#define X86_FEATURE_DECODEASSISTS (15*32+ 7) /* Decode Assists support */ +#define X86_FEATURE_PAUSEFILTER (15*32+10) /* filtered pause intercept */ +#define X86_FEATURE_PFTHRESHOLD (15*32+12) /* pause filter threshold */ +#define X86_FEATURE_AVIC (15*32+13) /* Virtual Interrupt Controller */ +#define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */ +#define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */ /* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ -#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ -#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ -#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ -#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ -#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ -#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ -#define X86_FEATURE_VPCLMULQDQ (16*32+ 10) /* Carry-Less Multiplication Double Quadword */ -#define X86_FEATURE_AVX512_VNNI (16*32+ 11) /* Vector Neural Network Instructions */ -#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB */ -#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ -#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ -#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ +#define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ +#define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ +#define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ +#define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ +#define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ +#define X86_FEATURE_VPCLMULQDQ (16*32+ 10) /* Carry-Less Multiplication Double Quadword */ +#define X86_FEATURE_AVX512_VNNI (16*32+ 11) /* Vector Neural Network Instructions */ +#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB */ +#define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ +#define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ +#define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ /* AMD-defined CPU features, CPUID level 0x80000007 (ebx), word 17 */ -#define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */ -#define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */ -#define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */ +#define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */ +#define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */ +#define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */ /* * BUG word(s) */ -#define X86_BUG(x) (NCAPINTS*32 + (x)) +#define X86_BUG(x) (NCAPINTS*32 + (x)) -#define X86_BUG_F00F X86_BUG(0) /* Intel F00F */ -#define X86_BUG_FDIV X86_BUG(1) /* FPU FDIV */ -#define X86_BUG_COMA X86_BUG(2) /* Cyrix 6x86 coma */ -#define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* "tlb_mmatch" AMD Erratum 383 */ -#define X86_BUG_AMD_APIC_C1E X86_BUG(4) /* "apic_c1e" AMD Erratum 400 */ -#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka 11AP */ -#define X86_BUG_FXSAVE_LEAK X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */ -#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */ -#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */ +#define X86_BUG_F00F X86_BUG(0) /* Intel F00F */ +#define X86_BUG_FDIV X86_BUG(1) /* FPU FDIV */ +#define X86_BUG_COMA X86_BUG(2) /* Cyrix 6x86 coma */ +#define X86_BUG_AMD_TLB_MMATCH X86_BUG(3) /* "tlb_mmatch" AMD Erratum 383 */ +#define X86_BUG_AMD_APIC_C1E X86_BUG(4) /* "apic_c1e" AMD Erratum 400 */ +#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka 11AP */ +#define X86_BUG_FXSAVE_LEAK X86_BUG(6) /* FXSAVE leaks FOP/FIP/FOP */ +#define X86_BUG_CLFLUSH_MONITOR X86_BUG(7) /* AAI65, CLFLUSH required before MONITOR */ +#define X86_BUG_SYSRET_SS_ATTRS X86_BUG(8) /* SYSRET doesn't fix up SS attrs */ #ifdef CONFIG_X86_32 /* * 64-bit kernels don't use X86_BUG_ESPFIX. Make the define conditional * to avoid confusion. */ -#define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ +#define X86_BUG_ESPFIX X86_BUG(9) /* "" IRET to 16-bit SS corrupts ESP/RSP high bits */ #endif -#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ -#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ -#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ -#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ +#define X86_BUG_NULL_SEG X86_BUG(10) /* Nulling a selector preserves the base */ +#define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ +#define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ +#define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ #endif /* _ASM_X86_CPUFEATURES_H */ From d9eb267780ff856580885df063c5f4a6630e345f Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Tue, 31 Oct 2017 13:17:23 +0100 Subject: [PATCH 0614/1013] x86/cpufeatures: Fix various details in the feature definitions commit f3a624e901c633593156f7b00ca743a6204a29bc upstream. Kept this commit separate from the re-tabulation changes, to make the changes easier to review: - add better explanation for entries with no explanation - fix/enhance the text of some of the entries - fix the vertical alignment of some of the feature number definitions - fix inconsistent capitalization - ... and lots of other small details i.e. make it all more of a coherent unit, instead of a patchwork of years of additions. Cc: Andrew Morton Cc: Andy Lutomirski Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20171031121723.28524-4-mingo@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 149 ++++++++++++++--------------- 1 file changed, 74 insertions(+), 75 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index ad1b835001cc..cdf5be866863 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -20,14 +20,12 @@ * Note: If the comment begins with a quoted string, that string is used * in /proc/cpuinfo instead of the macro name. If the string is "", * this feature bit is not displayed in /proc/cpuinfo at all. - */ - -/* + * * When adding new features here that depend on other features, - * please update the table in kernel/cpu/cpuid-deps.c + * please update the table in kernel/cpu/cpuid-deps.c as well. */ -/* Intel-defined CPU features, CPUID level 0x00000001 (edx), word 0 */ +/* Intel-defined CPU features, CPUID level 0x00000001 (EDX), word 0 */ #define X86_FEATURE_FPU ( 0*32+ 0) /* Onboard FPU */ #define X86_FEATURE_VME ( 0*32+ 1) /* Virtual Mode Extensions */ #define X86_FEATURE_DE ( 0*32+ 2) /* Debugging Extensions */ @@ -42,8 +40,7 @@ #define X86_FEATURE_MTRR ( 0*32+12) /* Memory Type Range Registers */ #define X86_FEATURE_PGE ( 0*32+13) /* Page Global Enable */ #define X86_FEATURE_MCA ( 0*32+14) /* Machine Check Architecture */ -#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions */ - /* (plus FCMOVcc, FCOMI with FPU) */ +#define X86_FEATURE_CMOV ( 0*32+15) /* CMOV instructions (plus FCMOVcc, FCOMI with FPU) */ #define X86_FEATURE_PAT ( 0*32+16) /* Page Attribute Table */ #define X86_FEATURE_PSE36 ( 0*32+17) /* 36-bit PSEs */ #define X86_FEATURE_PN ( 0*32+18) /* Processor serial number */ @@ -63,15 +60,15 @@ /* AMD-defined CPU features, CPUID level 0x80000001, word 1 */ /* Don't duplicate feature flags which are redundant with Intel! */ #define X86_FEATURE_SYSCALL ( 1*32+11) /* SYSCALL/SYSRET */ -#define X86_FEATURE_MP ( 1*32+19) /* MP Capable. */ +#define X86_FEATURE_MP ( 1*32+19) /* MP Capable */ #define X86_FEATURE_NX ( 1*32+20) /* Execute Disable */ #define X86_FEATURE_MMXEXT ( 1*32+22) /* AMD MMX extensions */ #define X86_FEATURE_FXSR_OPT ( 1*32+25) /* FXSAVE/FXRSTOR optimizations */ #define X86_FEATURE_GBPAGES ( 1*32+26) /* "pdpe1gb" GB pages */ #define X86_FEATURE_RDTSCP ( 1*32+27) /* RDTSCP */ -#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64) */ -#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow! extensions */ -#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow! */ +#define X86_FEATURE_LM ( 1*32+29) /* Long Mode (x86-64, 64-bit support) */ +#define X86_FEATURE_3DNOWEXT ( 1*32+30) /* AMD 3DNow extensions */ +#define X86_FEATURE_3DNOW ( 1*32+31) /* 3DNow */ /* Transmeta-defined CPU features, CPUID level 0x80860001, word 2 */ #define X86_FEATURE_RECOVERY ( 2*32+ 0) /* CPU in recovery mode */ @@ -84,66 +81,67 @@ #define X86_FEATURE_K6_MTRR ( 3*32+ 1) /* AMD K6 nonstandard MTRRs */ #define X86_FEATURE_CYRIX_ARR ( 3*32+ 2) /* Cyrix ARRs (= MTRRs) */ #define X86_FEATURE_CENTAUR_MCR ( 3*32+ 3) /* Centaur MCRs (= MTRRs) */ -/* cpu types for specific tunings: */ + +/* CPU types for specific tunings: */ #define X86_FEATURE_K8 ( 3*32+ 4) /* "" Opteron, Athlon64 */ #define X86_FEATURE_K7 ( 3*32+ 5) /* "" Athlon */ #define X86_FEATURE_P3 ( 3*32+ 6) /* "" P3 */ #define X86_FEATURE_P4 ( 3*32+ 7) /* "" P4 */ #define X86_FEATURE_CONSTANT_TSC ( 3*32+ 8) /* TSC ticks at a constant rate */ -#define X86_FEATURE_UP ( 3*32+ 9) /* smp kernel running on up */ -#define X86_FEATURE_ART ( 3*32+10) /* Platform has always running timer (ART) */ +#define X86_FEATURE_UP ( 3*32+ 9) /* SMP kernel running on UP */ +#define X86_FEATURE_ART ( 3*32+10) /* Always running timer (ART) */ #define X86_FEATURE_ARCH_PERFMON ( 3*32+11) /* Intel Architectural PerfMon */ #define X86_FEATURE_PEBS ( 3*32+12) /* Precise-Event Based Sampling */ #define X86_FEATURE_BTS ( 3*32+13) /* Branch Trace Store */ -#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in ia32 userspace */ -#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in ia32 userspace */ -#define X86_FEATURE_REP_GOOD ( 3*32+16) /* rep microcode works well */ -#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" Mfence synchronizes RDTSC */ -#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" Lfence synchronizes RDTSC */ +#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */ +#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */ +#define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */ +#define X86_FEATURE_MFENCE_RDTSC ( 3*32+17) /* "" MFENCE synchronizes RDTSC */ +#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */ #define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */ #define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */ #define X86_FEATURE_ALWAYS ( 3*32+21) /* "" Always-present feature */ -#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* cpu topology enum extensions */ +#define X86_FEATURE_XTOPOLOGY ( 3*32+22) /* CPU topology enum extensions */ #define X86_FEATURE_TSC_RELIABLE ( 3*32+23) /* TSC is known to be reliable */ #define X86_FEATURE_NONSTOP_TSC ( 3*32+24) /* TSC does not stop in C states */ #define X86_FEATURE_CPUID ( 3*32+25) /* CPU has CPUID instruction itself */ -#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* has extended APICID (8 bits) */ -#define X86_FEATURE_AMD_DCM ( 3*32+27) /* multi-node processor */ -#define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ +#define X86_FEATURE_EXTD_APICID ( 3*32+26) /* Extended APICID (8 bits) */ +#define X86_FEATURE_AMD_DCM ( 3*32+27) /* AMD multi-node processor */ +#define X86_FEATURE_APERFMPERF ( 3*32+28) /* P-State hardware coordination feedback capability (APERF/MPERF MSRs) */ #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ #define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */ -/* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ +/* Intel-defined CPU features, CPUID level 0x00000001 (ECX), word 4 */ #define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ #define X86_FEATURE_PCLMULQDQ ( 4*32+ 1) /* PCLMULQDQ instruction */ #define X86_FEATURE_DTES64 ( 4*32+ 2) /* 64-bit Debug Store */ -#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" Monitor/Mwait support */ -#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL Qual. Debug Store */ +#define X86_FEATURE_MWAIT ( 4*32+ 3) /* "monitor" MONITOR/MWAIT support */ +#define X86_FEATURE_DSCPL ( 4*32+ 4) /* "ds_cpl" CPL-qualified (filtered) Debug Store */ #define X86_FEATURE_VMX ( 4*32+ 5) /* Hardware virtualization */ -#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer mode */ +#define X86_FEATURE_SMX ( 4*32+ 6) /* Safer Mode eXtensions */ #define X86_FEATURE_EST ( 4*32+ 7) /* Enhanced SpeedStep */ #define X86_FEATURE_TM2 ( 4*32+ 8) /* Thermal Monitor 2 */ #define X86_FEATURE_SSSE3 ( 4*32+ 9) /* Supplemental SSE-3 */ #define X86_FEATURE_CID ( 4*32+10) /* Context ID */ #define X86_FEATURE_SDBG ( 4*32+11) /* Silicon Debug */ #define X86_FEATURE_FMA ( 4*32+12) /* Fused multiply-add */ -#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B */ +#define X86_FEATURE_CX16 ( 4*32+13) /* CMPXCHG16B instruction */ #define X86_FEATURE_XTPR ( 4*32+14) /* Send Task Priority Messages */ -#define X86_FEATURE_PDCM ( 4*32+15) /* Performance Capabilities */ +#define X86_FEATURE_PDCM ( 4*32+15) /* Perf/Debug Capabilities MSR */ #define X86_FEATURE_PCID ( 4*32+17) /* Process Context Identifiers */ #define X86_FEATURE_DCA ( 4*32+18) /* Direct Cache Access */ #define X86_FEATURE_XMM4_1 ( 4*32+19) /* "sse4_1" SSE-4.1 */ #define X86_FEATURE_XMM4_2 ( 4*32+20) /* "sse4_2" SSE-4.2 */ -#define X86_FEATURE_X2APIC ( 4*32+21) /* x2APIC */ +#define X86_FEATURE_X2APIC ( 4*32+21) /* X2APIC */ #define X86_FEATURE_MOVBE ( 4*32+22) /* MOVBE instruction */ #define X86_FEATURE_POPCNT ( 4*32+23) /* POPCNT instruction */ -#define X86_FEATURE_TSC_DEADLINE_TIMER ( 4*32+24) /* Tsc deadline timer */ +#define X86_FEATURE_TSC_DEADLINE_TIMER ( 4*32+24) /* TSC deadline timer */ #define X86_FEATURE_AES ( 4*32+25) /* AES instructions */ -#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV */ -#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE enabled in the OS */ +#define X86_FEATURE_XSAVE ( 4*32+26) /* XSAVE/XRSTOR/XSETBV/XGETBV instructions */ +#define X86_FEATURE_OSXSAVE ( 4*32+27) /* "" XSAVE instruction enabled in the OS */ #define X86_FEATURE_AVX ( 4*32+28) /* Advanced Vector Extensions */ -#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit fp conversions */ -#define X86_FEATURE_RDRAND ( 4*32+30) /* The RDRAND instruction */ +#define X86_FEATURE_F16C ( 4*32+29) /* 16-bit FP conversions */ +#define X86_FEATURE_RDRAND ( 4*32+30) /* RDRAND instruction */ #define X86_FEATURE_HYPERVISOR ( 4*32+31) /* Running on a hypervisor */ /* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */ @@ -158,10 +156,10 @@ #define X86_FEATURE_PMM ( 5*32+12) /* PadLock Montgomery Multiplier */ #define X86_FEATURE_PMM_EN ( 5*32+13) /* PMM enabled */ -/* More extended AMD flags: CPUID level 0x80000001, ecx, word 6 */ +/* More extended AMD flags: CPUID level 0x80000001, ECX, word 6 */ #define X86_FEATURE_LAHF_LM ( 6*32+ 0) /* LAHF/SAHF in long mode */ #define X86_FEATURE_CMP_LEGACY ( 6*32+ 1) /* If yes HyperThreading not valid */ -#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure virtual machine */ +#define X86_FEATURE_SVM ( 6*32+ 2) /* Secure Virtual Machine */ #define X86_FEATURE_EXTAPIC ( 6*32+ 3) /* Extended APIC space */ #define X86_FEATURE_CR8_LEGACY ( 6*32+ 4) /* CR8 in 32-bit mode */ #define X86_FEATURE_ABM ( 6*32+ 5) /* Advanced bit manipulation */ @@ -175,16 +173,16 @@ #define X86_FEATURE_WDT ( 6*32+13) /* Watchdog timer */ #define X86_FEATURE_LWP ( 6*32+15) /* Light Weight Profiling */ #define X86_FEATURE_FMA4 ( 6*32+16) /* 4 operands MAC instructions */ -#define X86_FEATURE_TCE ( 6*32+17) /* translation cache extension */ +#define X86_FEATURE_TCE ( 6*32+17) /* Translation Cache Extension */ #define X86_FEATURE_NODEID_MSR ( 6*32+19) /* NodeId MSR */ -#define X86_FEATURE_TBM ( 6*32+21) /* trailing bit manipulations */ -#define X86_FEATURE_TOPOEXT ( 6*32+22) /* topology extensions CPUID leafs */ -#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* core performance counter extensions */ +#define X86_FEATURE_TBM ( 6*32+21) /* Trailing Bit Manipulations */ +#define X86_FEATURE_TOPOEXT ( 6*32+22) /* Topology extensions CPUID leafs */ +#define X86_FEATURE_PERFCTR_CORE ( 6*32+23) /* Core performance counter extensions */ #define X86_FEATURE_PERFCTR_NB ( 6*32+24) /* NB performance counter extensions */ -#define X86_FEATURE_BPEXT (6*32+26) /* data breakpoint extension */ -#define X86_FEATURE_PTSC ( 6*32+27) /* performance time-stamp counter */ +#define X86_FEATURE_BPEXT ( 6*32+26) /* Data breakpoint extension */ +#define X86_FEATURE_PTSC ( 6*32+27) /* Performance time-stamp counter */ #define X86_FEATURE_PERFCTR_LLC ( 6*32+28) /* Last Level Cache performance counter extensions */ -#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX) */ +#define X86_FEATURE_MWAITX ( 6*32+29) /* MWAIT extension (MONITORX/MWAITX instructions) */ /* * Auxiliary flags: Linux defined - For features scattered in various @@ -192,7 +190,7 @@ * * Reuse free bits when adding new feature flags! */ -#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT */ +#define X86_FEATURE_RING3MWAIT ( 7*32+ 0) /* Ring 3 MONITOR/MWAIT instructions */ #define X86_FEATURE_CPUID_FAULT ( 7*32+ 1) /* Intel CPUID faulting */ #define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */ #define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */ @@ -206,8 +204,8 @@ #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ -#define X86_FEATURE_AVX512_4VNNIW (7*32+16) /* AVX-512 Neural Network Instructions */ -#define X86_FEATURE_AVX512_4FMAPS (7*32+17) /* AVX-512 Multiply Accumulation Single precision */ +#define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */ +#define X86_FEATURE_AVX512_4FMAPS ( 7*32+17) /* AVX-512 Multiply Accumulation Single precision */ #define X86_FEATURE_MBA ( 7*32+18) /* Memory Bandwidth Allocation */ @@ -218,19 +216,19 @@ #define X86_FEATURE_EPT ( 8*32+ 3) /* Intel Extended Page Table */ #define X86_FEATURE_VPID ( 8*32+ 4) /* Intel Virtual Processor ID */ -#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer vmmcall to vmcall */ +#define X86_FEATURE_VMMCALL ( 8*32+15) /* Prefer VMMCALL to VMCALL */ #define X86_FEATURE_XENPV ( 8*32+16) /* "" Xen paravirtual guest */ -/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */ -#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/ -#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3b */ +/* Intel-defined CPU features, CPUID level 0x00000007:0 (EBX), word 9 */ +#define X86_FEATURE_FSGSBASE ( 9*32+ 0) /* RDFSBASE, WRFSBASE, RDGSBASE, WRGSBASE instructions*/ +#define X86_FEATURE_TSC_ADJUST ( 9*32+ 1) /* TSC adjustment MSR 0x3B */ #define X86_FEATURE_BMI1 ( 9*32+ 3) /* 1st group bit manipulation extensions */ #define X86_FEATURE_HLE ( 9*32+ 4) /* Hardware Lock Elision */ #define X86_FEATURE_AVX2 ( 9*32+ 5) /* AVX2 instructions */ #define X86_FEATURE_SMEP ( 9*32+ 7) /* Supervisor Mode Execution Protection */ #define X86_FEATURE_BMI2 ( 9*32+ 8) /* 2nd group bit manipulation extensions */ -#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB */ +#define X86_FEATURE_ERMS ( 9*32+ 9) /* Enhanced REP MOVSB/STOSB instructions */ #define X86_FEATURE_INVPCID ( 9*32+10) /* Invalidate Processor Context ID */ #define X86_FEATURE_RTM ( 9*32+11) /* Restricted Transactional Memory */ #define X86_FEATURE_CQM ( 9*32+12) /* Cache QoS Monitoring */ @@ -238,8 +236,8 @@ #define X86_FEATURE_RDT_A ( 9*32+15) /* Resource Director Technology Allocation */ #define X86_FEATURE_AVX512F ( 9*32+16) /* AVX-512 Foundation */ #define X86_FEATURE_AVX512DQ ( 9*32+17) /* AVX-512 DQ (Double/Quad granular) Instructions */ -#define X86_FEATURE_RDSEED ( 9*32+18) /* The RDSEED instruction */ -#define X86_FEATURE_ADX ( 9*32+19) /* The ADCX and ADOX instructions */ +#define X86_FEATURE_RDSEED ( 9*32+18) /* RDSEED instruction */ +#define X86_FEATURE_ADX ( 9*32+19) /* ADCX and ADOX instructions */ #define X86_FEATURE_SMAP ( 9*32+20) /* Supervisor Mode Access Prevention */ #define X86_FEATURE_AVX512IFMA ( 9*32+21) /* AVX-512 Integer Fused Multiply-Add instructions */ #define X86_FEATURE_CLFLUSHOPT ( 9*32+23) /* CLFLUSHOPT instruction */ @@ -251,25 +249,25 @@ #define X86_FEATURE_AVX512BW ( 9*32+30) /* AVX-512 BW (Byte/Word granular) Instructions */ #define X86_FEATURE_AVX512VL ( 9*32+31) /* AVX-512 VL (128/256 Vector Length) Extensions */ -/* Extended state features, CPUID level 0x0000000d:1 (eax), word 10 */ -#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT */ -#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC */ -#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 */ -#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS */ +/* Extended state features, CPUID level 0x0000000d:1 (EAX), word 10 */ +#define X86_FEATURE_XSAVEOPT (10*32+ 0) /* XSAVEOPT instruction */ +#define X86_FEATURE_XSAVEC (10*32+ 1) /* XSAVEC instruction */ +#define X86_FEATURE_XGETBV1 (10*32+ 2) /* XGETBV with ECX = 1 instruction */ +#define X86_FEATURE_XSAVES (10*32+ 3) /* XSAVES/XRSTORS instructions */ -/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (edx), word 11 */ +/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:0 (EDX), word 11 */ #define X86_FEATURE_CQM_LLC (11*32+ 1) /* LLC QoS if 1 */ -/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (edx), word 12 */ -#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring if 1 */ +/* Intel-defined CPU QoS Sub-leaf, CPUID level 0x0000000F:1 (EDX), word 12 */ +#define X86_FEATURE_CQM_OCCUP_LLC (12*32+ 0) /* LLC occupancy monitoring */ #define X86_FEATURE_CQM_MBM_TOTAL (12*32+ 1) /* LLC Total MBM monitoring */ #define X86_FEATURE_CQM_MBM_LOCAL (12*32+ 2) /* LLC Local MBM monitoring */ -/* AMD-defined CPU features, CPUID level 0x80000008 (ebx), word 13 */ -#define X86_FEATURE_CLZERO (13*32+0) /* CLZERO instruction */ -#define X86_FEATURE_IRPERF (13*32+1) /* Instructions Retired Count */ +/* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ +#define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ +#define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */ -/* Thermal and Power Management Leaf, CPUID level 0x00000006 (eax), word 14 */ +/* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ #define X86_FEATURE_IDA (14*32+ 1) /* Intel Dynamic Acceleration */ #define X86_FEATURE_ARAT (14*32+ 2) /* Always Running APIC Timer */ @@ -281,7 +279,7 @@ #define X86_FEATURE_HWP_EPP (14*32+10) /* HWP Energy Perf. Preference */ #define X86_FEATURE_HWP_PKG_REQ (14*32+11) /* HWP Package Level Request */ -/* AMD SVM Feature Identification, CPUID level 0x8000000a (edx), word 15 */ +/* AMD SVM Feature Identification, CPUID level 0x8000000a (EDX), word 15 */ #define X86_FEATURE_NPT (15*32+ 0) /* Nested Page Table support */ #define X86_FEATURE_LBRV (15*32+ 1) /* LBR Virtualization support */ #define X86_FEATURE_SVML (15*32+ 2) /* "svm_lock" SVM locking MSR */ @@ -296,24 +294,24 @@ #define X86_FEATURE_V_VMSAVE_VMLOAD (15*32+15) /* Virtual VMSAVE VMLOAD */ #define X86_FEATURE_VGIF (15*32+16) /* Virtual GIF */ -/* Intel-defined CPU features, CPUID level 0x00000007:0 (ecx), word 16 */ +/* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ #define X86_FEATURE_GFNI (16*32+ 8) /* Galois Field New Instructions */ #define X86_FEATURE_VAES (16*32+ 9) /* Vector AES */ -#define X86_FEATURE_VPCLMULQDQ (16*32+ 10) /* Carry-Less Multiplication Double Quadword */ -#define X86_FEATURE_AVX512_VNNI (16*32+ 11) /* Vector Neural Network Instructions */ -#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB */ +#define X86_FEATURE_VPCLMULQDQ (16*32+10) /* Carry-Less Multiplication Double Quadword */ +#define X86_FEATURE_AVX512_VNNI (16*32+11) /* Vector Neural Network Instructions */ +#define X86_FEATURE_AVX512_BITALG (16*32+12) /* Support for VPOPCNT[B,W] and VPSHUF-BITQMB instructions */ #define X86_FEATURE_AVX512_VPOPCNTDQ (16*32+14) /* POPCNT for vectors of DW/QW */ #define X86_FEATURE_LA57 (16*32+16) /* 5-level page tables */ #define X86_FEATURE_RDPID (16*32+22) /* RDPID instruction */ -/* AMD-defined CPU features, CPUID level 0x80000007 (ebx), word 17 */ -#define X86_FEATURE_OVERFLOW_RECOV (17*32+0) /* MCA overflow recovery support */ -#define X86_FEATURE_SUCCOR (17*32+1) /* Uncorrectable error containment and recovery */ -#define X86_FEATURE_SMCA (17*32+3) /* Scalable MCA */ +/* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */ +#define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */ +#define X86_FEATURE_SUCCOR (17*32+ 1) /* Uncorrectable error containment and recovery */ +#define X86_FEATURE_SMCA (17*32+ 3) /* Scalable MCA */ /* * BUG word(s) @@ -340,4 +338,5 @@ #define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ + #endif /* _ASM_X86_CPUFEATURES_H */ From 46e6a15b40c99402aedde940f157614e0c927c7e Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Sat, 4 Nov 2017 04:19:50 -0700 Subject: [PATCH 0615/1013] selftests/x86/ldt_gdt: Add infrastructure to test set_thread_area() commit d744dcad39094c9187075e274d1cdef79c57c8b5 upstream. Much of the test design could apply to set_thread_area() (i.e. GDT), not just modify_ldt(). Add set_thread_area() to the install_valid_mode() helper. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/02c23f8fba5547007f741dc24c3926e5284ede02.1509794321.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/x86/ldt_gdt.c | 53 +++++++++++++++++++-------- 1 file changed, 37 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c index 2afc41a3730f..4e9add7bf8a9 100644 --- a/tools/testing/selftests/x86/ldt_gdt.c +++ b/tools/testing/selftests/x86/ldt_gdt.c @@ -137,30 +137,51 @@ static void check_valid_segment(uint16_t index, int ldt, } } -static bool install_valid_mode(const struct user_desc *desc, uint32_t ar, - bool oldmode) +static bool install_valid_mode(const struct user_desc *d, uint32_t ar, + bool oldmode, bool ldt) { - int ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11, - desc, sizeof(*desc)); - if (ret < -1) - errno = -ret; + struct user_desc desc = *d; + int ret; + + if (!ldt) { +#ifndef __i386__ + /* No point testing set_thread_area in a 64-bit build */ + return false; +#endif + if (!gdt_entry_num) + return false; + desc.entry_number = gdt_entry_num; + + ret = syscall(SYS_set_thread_area, &desc); + } else { + ret = syscall(SYS_modify_ldt, oldmode ? 1 : 0x11, + &desc, sizeof(desc)); + + if (ret < -1) + errno = -ret; + + if (ret != 0 && errno == ENOSYS) { + printf("[OK]\tmodify_ldt returned -ENOSYS\n"); + return false; + } + } + if (ret == 0) { - uint32_t limit = desc->limit; - if (desc->limit_in_pages) + uint32_t limit = desc.limit; + if (desc.limit_in_pages) limit = (limit << 12) + 4095; - check_valid_segment(desc->entry_number, 1, ar, limit, true); + check_valid_segment(desc.entry_number, ldt, ar, limit, true); return true; - } else if (errno == ENOSYS) { - printf("[OK]\tmodify_ldt returned -ENOSYS\n"); - return false; } else { - if (desc->seg_32bit) { - printf("[FAIL]\tUnexpected modify_ldt failure %d\n", + if (desc.seg_32bit) { + printf("[FAIL]\tUnexpected %s failure %d\n", + ldt ? "modify_ldt" : "set_thread_area", errno); nerrs++; return false; } else { - printf("[OK]\tmodify_ldt rejected 16 bit segment\n"); + printf("[OK]\t%s rejected 16 bit segment\n", + ldt ? "modify_ldt" : "set_thread_area"); return false; } } @@ -168,7 +189,7 @@ static bool install_valid_mode(const struct user_desc *desc, uint32_t ar, static bool install_valid(const struct user_desc *desc, uint32_t ar) { - return install_valid_mode(desc, ar, false); + return install_valid_mode(desc, ar, false, true); } static void install_invalid(const struct user_desc *desc, bool oldmode) From 4a464a66db6d574dd622b5218ca617da41461149 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Sat, 4 Nov 2017 04:19:51 -0700 Subject: [PATCH 0616/1013] selftests/x86/ldt_gdt: Run most existing LDT test cases against the GDT as well commit adedf2893c192dd09b1cc2f2dcfdd7cad99ec49d upstream. Now that the main test infrastructure supports the GDT, run tests that will pass the kernel's GDT permission tests against the GDT. Signed-off-by: Andy Lutomirski Cc: Borislav Petkov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/686a1eda63414da38fcecc2412db8dba1ae40581.1509794321.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/x86/ldt_gdt.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c index 4e9add7bf8a9..66e5ce5b91f0 100644 --- a/tools/testing/selftests/x86/ldt_gdt.c +++ b/tools/testing/selftests/x86/ldt_gdt.c @@ -189,7 +189,15 @@ static bool install_valid_mode(const struct user_desc *d, uint32_t ar, static bool install_valid(const struct user_desc *desc, uint32_t ar) { - return install_valid_mode(desc, ar, false, true); + bool ret = install_valid_mode(desc, ar, false, true); + + if (desc->contents <= 1 && desc->seg_32bit && + !desc->seg_not_present) { + /* Should work in the GDT, too. */ + install_valid_mode(desc, ar, false, false); + } + + return ret; } static void install_invalid(const struct user_desc *desc, bool oldmode) From 99aee22dca890fb835570fa3a2a72ea43cca93ae Mon Sep 17 00:00:00 2001 From: James Morse Date: Mon, 6 Nov 2017 18:44:24 +0000 Subject: [PATCH 0617/1013] ACPI / APEI: Replace ioremap_page_range() with fixmap commit 4f89fa286f6729312e227e7c2d764e8e7b9d340e upstream. Replace ghes_io{re,un}map_pfn_{nmi,irq}()s use of ioremap_page_range() with __set_fixmap() as ioremap_page_range() may sleep to allocate a new level of page-table, even if its passed an existing final-address to use in the mapping. The GHES driver can only be enabled for architectures that select HAVE_ACPI_APEI: Add fixmap entries to both x86 and arm64. clear_fixmap() does the TLB invalidation in __set_fixmap() for arm64 and __set_pte_vaddr() for x86. In each case its the same as the respective arch_apei_flush_tlb_one(). Reported-by: Fengguang Wu Suggested-by: Linus Torvalds Signed-off-by: James Morse Reviewed-by: Borislav Petkov Tested-by: Tyler Baicar Tested-by: Toshi Kani [ For the arm64 bits: ] Acked-by: Will Deacon [ For the x86 bits: ] Acked-by: Ingo Molnar Signed-off-by: Rafael J. Wysocki Cc: All applicable Signed-off-by: Greg Kroah-Hartman --- arch/arm64/include/asm/fixmap.h | 7 ++++++ arch/x86/include/asm/fixmap.h | 6 +++++ drivers/acpi/apei/ghes.c | 44 +++++++++++---------------------- 3 files changed, 27 insertions(+), 30 deletions(-) diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h index caf86be815ba..4052ec39e8db 100644 --- a/arch/arm64/include/asm/fixmap.h +++ b/arch/arm64/include/asm/fixmap.h @@ -51,6 +51,13 @@ enum fixed_addresses { FIX_EARLYCON_MEM_BASE, FIX_TEXT_POKE0, + +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif /* CONFIG_ACPI_APEI_GHES */ + __end_of_permanent_fixed_addresses, /* diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index dcd9fb55e679..b0c505fe9a95 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -104,6 +104,12 @@ enum fixed_addresses { FIX_GDT_REMAP_BEGIN, FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, +#ifdef CONFIG_ACPI_APEI_GHES + /* Used for GHES mapping from assorted contexts */ + FIX_APEI_GHES_IRQ, + FIX_APEI_GHES_NMI, +#endif + __end_of_permanent_fixed_addresses, /* diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index cb7aceae3553..572b6c7303ed 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -51,6 +51,7 @@ #include #include #include +#include #include #include @@ -112,7 +113,7 @@ static DEFINE_MUTEX(ghes_list_mutex); * Because the memory area used to transfer hardware error information * from BIOS to Linux can be determined only in NMI, IRQ or timer * handler, but general ioremap can not be used in atomic context, so - * a special version of atomic ioremap is implemented for that. + * the fixmap is used instead. */ /* @@ -126,8 +127,8 @@ static DEFINE_MUTEX(ghes_list_mutex); /* virtual memory area for atomic ioremap */ static struct vm_struct *ghes_ioremap_area; /* - * These 2 spinlock is used to prevent atomic ioremap virtual memory - * area from being mapped simultaneously. + * These 2 spinlocks are used to prevent the fixmap entries from being used + * simultaneously. */ static DEFINE_RAW_SPINLOCK(ghes_ioremap_lock_nmi); static DEFINE_SPINLOCK(ghes_ioremap_lock_irq); @@ -159,53 +160,36 @@ static void ghes_ioremap_exit(void) static void __iomem *ghes_ioremap_pfn_nmi(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_NMI_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); + __set_fixmap(FIX_APEI_GHES_NMI, paddr, prot); - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_NMI); } static void __iomem *ghes_ioremap_pfn_irq(u64 pfn) { - unsigned long vaddr; phys_addr_t paddr; pgprot_t prot; - vaddr = (unsigned long)GHES_IOREMAP_IRQ_PAGE(ghes_ioremap_area->addr); - paddr = pfn << PAGE_SHIFT; prot = arch_apei_get_mem_attribute(paddr); + __set_fixmap(FIX_APEI_GHES_IRQ, paddr, prot); - ioremap_page_range(vaddr, vaddr + PAGE_SIZE, paddr, prot); - - return (void __iomem *)vaddr; + return (void __iomem *) fix_to_virt(FIX_APEI_GHES_IRQ); } -static void ghes_iounmap_nmi(void __iomem *vaddr_ptr) +static void ghes_iounmap_nmi(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_NMI_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_NMI); } -static void ghes_iounmap_irq(void __iomem *vaddr_ptr) +static void ghes_iounmap_irq(void) { - unsigned long vaddr = (unsigned long __force)vaddr_ptr; - void *base = ghes_ioremap_area->addr; - - BUG_ON(vaddr != (unsigned long)GHES_IOREMAP_IRQ_PAGE(base)); - unmap_kernel_range_noflush(vaddr, PAGE_SIZE); - arch_apei_flush_tlb_one(vaddr); + clear_fixmap(FIX_APEI_GHES_IRQ); } static int ghes_estatus_pool_init(void) @@ -361,10 +345,10 @@ static void ghes_copy_tofrom_phys(void *buffer, u64 paddr, u32 len, paddr += trunk; buffer += trunk; if (in_nmi) { - ghes_iounmap_nmi(vaddr); + ghes_iounmap_nmi(); raw_spin_unlock(&ghes_ioremap_lock_nmi); } else { - ghes_iounmap_irq(vaddr); + ghes_iounmap_irq(); spin_unlock_irqrestore(&ghes_ioremap_lock_irq, flags); } } From 04d26709b13e72b02d5e435c6a3bebb63167ac71 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 9 Nov 2017 14:27:35 +0100 Subject: [PATCH 0618/1013] x86/virt, x86/platform: Merge 'struct x86_hyper' into 'struct x86_platform' and 'struct x86_init' commit f72e38e8ec8869ac0ba5a75d7d2f897d98a1454e upstream. Instead of x86_hyper being either NULL on bare metal or a pointer to a struct hypervisor_x86 in case of the kernel running as a guest merge the struct into x86_platform and x86_init. This will remove the need for wrappers making it hard to find out what is being called. With dummy functions added for all callbacks testing for a NULL function pointer can be removed, too. Suggested-by: Ingo Molnar Signed-off-by: Juergen Gross Acked-by: Thomas Gleixner Cc: Linus Torvalds Cc: Peter Zijlstra Cc: akataria@vmware.com Cc: boris.ostrovsky@oracle.com Cc: devel@linuxdriverproject.org Cc: haiyangz@microsoft.com Cc: kvm@vger.kernel.org Cc: kys@microsoft.com Cc: pbonzini@redhat.com Cc: rkrcmar@redhat.com Cc: rusty@rustcorp.com.au Cc: sthemmin@microsoft.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20171109132739.23465-2-jgross@suse.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/hypervisor.h | 25 +++----------- arch/x86/include/asm/x86_init.h | 24 ++++++++++++++ arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/cpu/hypervisor.c | 54 +++++++++++++++---------------- arch/x86/kernel/cpu/mshyperv.c | 2 +- arch/x86/kernel/cpu/vmware.c | 4 +-- arch/x86/kernel/kvm.c | 2 +- arch/x86/kernel/x86_init.c | 9 ++++++ arch/x86/mm/init.c | 2 +- arch/x86/xen/enlighten_hvm.c | 8 ++--- arch/x86/xen/enlighten_pv.c | 2 +- include/linux/hypervisor.h | 8 +++-- 12 files changed, 81 insertions(+), 61 deletions(-) diff --git a/arch/x86/include/asm/hypervisor.h b/arch/x86/include/asm/hypervisor.h index 0ead9dbb9130..0eca7239a7aa 100644 --- a/arch/x86/include/asm/hypervisor.h +++ b/arch/x86/include/asm/hypervisor.h @@ -23,6 +23,7 @@ #ifdef CONFIG_HYPERVISOR_GUEST #include +#include #include /* @@ -35,17 +36,11 @@ struct hypervisor_x86 { /* Detection routine */ uint32_t (*detect)(void); - /* Platform setup (run once per boot) */ - void (*init_platform)(void); + /* init time callbacks */ + struct x86_hyper_init init; - /* X2APIC detection (run once per boot) */ - bool (*x2apic_available)(void); - - /* pin current vcpu to specified physical cpu (run rarely) */ - void (*pin_vcpu)(int); - - /* called during init_mem_mapping() to setup early mappings. */ - void (*init_mem_mapping)(void); + /* runtime callbacks */ + struct x86_hyper_runtime runtime; }; extern const struct hypervisor_x86 *x86_hyper; @@ -58,17 +53,7 @@ extern const struct hypervisor_x86 x86_hyper_xen_hvm; extern const struct hypervisor_x86 x86_hyper_kvm; extern void init_hypervisor_platform(void); -extern bool hypervisor_x2apic_available(void); -extern void hypervisor_pin_vcpu(int cpu); - -static inline void hypervisor_init_mem_mapping(void) -{ - if (x86_hyper && x86_hyper->init_mem_mapping) - x86_hyper->init_mem_mapping(); -} #else static inline void init_hypervisor_platform(void) { } -static inline bool hypervisor_x2apic_available(void) { return false; } -static inline void hypervisor_init_mem_mapping(void) { } #endif /* CONFIG_HYPERVISOR_GUEST */ #endif /* _ASM_X86_HYPERVISOR_H */ diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h index 8a1ebf9540dd..ad15a0fda917 100644 --- a/arch/x86/include/asm/x86_init.h +++ b/arch/x86/include/asm/x86_init.h @@ -114,6 +114,18 @@ struct x86_init_pci { void (*fixup_irqs)(void); }; +/** + * struct x86_hyper_init - x86 hypervisor init functions + * @init_platform: platform setup + * @x2apic_available: X2APIC detection + * @init_mem_mapping: setup early mappings during init_mem_mapping() + */ +struct x86_hyper_init { + void (*init_platform)(void); + bool (*x2apic_available)(void); + void (*init_mem_mapping)(void); +}; + /** * struct x86_init_ops - functions for platform specific setup * @@ -127,6 +139,7 @@ struct x86_init_ops { struct x86_init_timers timers; struct x86_init_iommu iommu; struct x86_init_pci pci; + struct x86_hyper_init hyper; }; /** @@ -199,6 +212,15 @@ struct x86_legacy_features { struct x86_legacy_devices devices; }; +/** + * struct x86_hyper_runtime - x86 hypervisor specific runtime callbacks + * + * @pin_vcpu: pin current vcpu to specified physical cpu (run rarely) + */ +struct x86_hyper_runtime { + void (*pin_vcpu)(int cpu); +}; + /** * struct x86_platform_ops - platform specific runtime functions * @calibrate_cpu: calibrate CPU @@ -218,6 +240,7 @@ struct x86_legacy_features { * possible in x86_early_init_platform_quirks() by * only using the current x86_hardware_subarch * semantics. + * @hyper: x86 hypervisor specific runtime callbacks */ struct x86_platform_ops { unsigned long (*calibrate_cpu)(void); @@ -233,6 +256,7 @@ struct x86_platform_ops { void (*apic_post_init)(void); struct x86_legacy_features legacy; void (*set_legacy_features)(void); + struct x86_hyper_runtime hyper; }; struct pci_dev; diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index ff891772c9f8..89c7c8569e5e 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -1645,7 +1645,7 @@ static __init void try_to_enable_x2apic(int remap_mode) * under KVM */ if (max_physical_apicid > 255 || - !hypervisor_x2apic_available()) { + !x86_init.hyper.x2apic_available()) { pr_info("x2apic: IRQ remapping doesn't support X2APIC mode\n"); x2apic_disable(); return; diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c index 4fa90006ac68..22226c1bf092 100644 --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -44,51 +44,49 @@ static const __initconst struct hypervisor_x86 * const hypervisors[] = const struct hypervisor_x86 *x86_hyper; EXPORT_SYMBOL(x86_hyper); -static inline void __init +static inline const struct hypervisor_x86 * __init detect_hypervisor_vendor(void) { - const struct hypervisor_x86 *h, * const *p; + const struct hypervisor_x86 *h = NULL, * const *p; uint32_t pri, max_pri = 0; for (p = hypervisors; p < hypervisors + ARRAY_SIZE(hypervisors); p++) { - h = *p; - pri = h->detect(); - if (pri != 0 && pri > max_pri) { + pri = (*p)->detect(); + if (pri > max_pri) { max_pri = pri; - x86_hyper = h; + h = *p; } } - if (max_pri) - pr_info("Hypervisor detected: %s\n", x86_hyper->name); + if (h) + pr_info("Hypervisor detected: %s\n", h->name); + + return h; } -void __init init_hypervisor_platform(void) +static void __init copy_array(const void *src, void *target, unsigned int size) { + unsigned int i, n = size / sizeof(void *); + const void * const *from = (const void * const *)src; + const void **to = (const void **)target; - detect_hypervisor_vendor(); - - if (!x86_hyper) - return; - - if (x86_hyper->init_platform) - x86_hyper->init_platform(); + for (i = 0; i < n; i++) + if (from[i]) + to[i] = from[i]; } -bool __init hypervisor_x2apic_available(void) +void __init init_hypervisor_platform(void) { - return x86_hyper && - x86_hyper->x2apic_available && - x86_hyper->x2apic_available(); -} + const struct hypervisor_x86 *h; -void hypervisor_pin_vcpu(int cpu) -{ - if (!x86_hyper) + h = detect_hypervisor_vendor(); + + if (!h) return; - if (x86_hyper->pin_vcpu) - x86_hyper->pin_vcpu(cpu); - else - WARN_ONCE(1, "vcpu pinning requested but not supported!\n"); + copy_array(&h->init, &x86_init.hyper, sizeof(h->init)); + copy_array(&h->runtime, &x86_platform.hyper, sizeof(h->runtime)); + + x86_hyper = h; + x86_init.hyper.init_platform(); } diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 236324e83a3a..6bb84d655e4b 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -257,6 +257,6 @@ static void __init ms_hyperv_init_platform(void) const __refconst struct hypervisor_x86 x86_hyper_ms_hyperv = { .name = "Microsoft Hyper-V", .detect = ms_hyperv_platform, - .init_platform = ms_hyperv_init_platform, + .init.init_platform = ms_hyperv_init_platform, }; EXPORT_SYMBOL(x86_hyper_ms_hyperv); diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 40ed26852ebd..4804c1d063c8 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -208,7 +208,7 @@ static bool __init vmware_legacy_x2apic_available(void) const __refconst struct hypervisor_x86 x86_hyper_vmware = { .name = "VMware", .detect = vmware_platform, - .init_platform = vmware_platform_setup, - .x2apic_available = vmware_legacy_x2apic_available, + .init.init_platform = vmware_platform_setup, + .init.x2apic_available = vmware_legacy_x2apic_available, }; EXPORT_SYMBOL(x86_hyper_vmware); diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 8bb9594d0761..9dca8437c795 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -547,7 +547,7 @@ static uint32_t __init kvm_detect(void) const struct hypervisor_x86 x86_hyper_kvm __refconst = { .name = "KVM", .detect = kvm_detect, - .x2apic_available = kvm_para_available, + .init.x2apic_available = kvm_para_available, }; EXPORT_SYMBOL_GPL(x86_hyper_kvm); diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index a088b2c47f73..5b2d10c1973a 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -28,6 +28,8 @@ void x86_init_noop(void) { } void __init x86_init_uint_noop(unsigned int unused) { } int __init iommu_init_noop(void) { return 0; } void iommu_shutdown_noop(void) { } +bool __init bool_x86_init_noop(void) { return false; } +void x86_op_int_noop(int cpu) { } /* * The platform setup functions are preset with the default functions @@ -81,6 +83,12 @@ struct x86_init_ops x86_init __initdata = { .init_irq = x86_default_pci_init_irq, .fixup_irqs = x86_default_pci_fixup_irqs, }, + + .hyper = { + .init_platform = x86_init_noop, + .x2apic_available = bool_x86_init_noop, + .init_mem_mapping = x86_init_noop, + }, }; struct x86_cpuinit_ops x86_cpuinit = { @@ -101,6 +109,7 @@ struct x86_platform_ops x86_platform __ro_after_init = { .get_nmi_reason = default_get_nmi_reason, .save_sched_clock_state = tsc_save_sched_clock_state, .restore_sched_clock_state = tsc_restore_sched_clock_state, + .hyper.pin_vcpu = x86_op_int_noop, }; EXPORT_SYMBOL_GPL(x86_platform); diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index af5c1ed21d43..a22c2b95e513 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -671,7 +671,7 @@ void __init init_mem_mapping(void) load_cr3(swapper_pg_dir); __flush_tlb_all(); - hypervisor_init_mem_mapping(); + x86_init.hyper.init_mem_mapping(); early_memtest(0, max_pfn_mapped << PAGE_SHIFT); } diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c index de503c225ae1..7b1622089f96 100644 --- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -229,9 +229,9 @@ static uint32_t __init xen_platform_hvm(void) const struct hypervisor_x86 x86_hyper_xen_hvm = { .name = "Xen HVM", .detect = xen_platform_hvm, - .init_platform = xen_hvm_guest_init, - .pin_vcpu = xen_pin_vcpu, - .x2apic_available = xen_x2apic_para_available, - .init_mem_mapping = xen_hvm_init_mem_mapping, + .init.init_platform = xen_hvm_guest_init, + .init.x2apic_available = xen_x2apic_para_available, + .init.init_mem_mapping = xen_hvm_init_mem_mapping, + .runtime.pin_vcpu = xen_pin_vcpu, }; EXPORT_SYMBOL(x86_hyper_xen_hvm); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index e55d276afc70..57d63b0f1c0c 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1462,6 +1462,6 @@ static uint32_t __init xen_platform_pv(void) const struct hypervisor_x86 x86_hyper_xen_pv = { .name = "Xen PV", .detect = xen_platform_pv, - .pin_vcpu = xen_pin_vcpu, + .runtime.pin_vcpu = xen_pin_vcpu, }; EXPORT_SYMBOL(x86_hyper_xen_pv); diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h index b4054fd5b6f6..b19563f9a8eb 100644 --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -7,8 +7,12 @@ * Juergen Gross */ -#ifdef CONFIG_HYPERVISOR_GUEST -#include +#ifdef CONFIG_X86 +#include +static inline void hypervisor_pin_vcpu(int cpu) +{ + x86_platform.hyper.pin_vcpu(cpu); +} #else static inline void hypervisor_pin_vcpu(int cpu) { From 399bbc9bb611e10dc5fd232b96b0bea18fca3482 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 9 Nov 2017 14:27:36 +0100 Subject: [PATCH 0619/1013] x86/virt: Add enum for hypervisors to replace x86_hyper commit 03b2a320b19f1424e9ac9c21696be9c60b6d0d93 upstream. The x86_hyper pointer is only used for checking whether a virtual device is supporting the hypervisor the system is running on. Use an enum for that purpose instead and drop the x86_hyper pointer. Signed-off-by: Juergen Gross Acked-by: Thomas Gleixner Acked-by: Xavier Deguillard Cc: Linus Torvalds Cc: Peter Zijlstra Cc: akataria@vmware.com Cc: arnd@arndb.de Cc: boris.ostrovsky@oracle.com Cc: devel@linuxdriverproject.org Cc: dmitry.torokhov@gmail.com Cc: gregkh@linuxfoundation.org Cc: haiyangz@microsoft.com Cc: kvm@vger.kernel.org Cc: kys@microsoft.com Cc: linux-graphics-maintainer@vmware.com Cc: linux-input@vger.kernel.org Cc: moltmann@vmware.com Cc: pbonzini@redhat.com Cc: pv-drivers@vmware.com Cc: rkrcmar@redhat.com Cc: sthemmin@microsoft.com Cc: virtualization@lists.linux-foundation.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20171109132739.23465-3-jgross@suse.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/hyperv/hv_init.c | 2 +- arch/x86/include/asm/hypervisor.h | 23 ++++++++++++++--------- arch/x86/kernel/cpu/hypervisor.c | 12 +++++++++--- arch/x86/kernel/cpu/mshyperv.c | 4 ++-- arch/x86/kernel/cpu/vmware.c | 4 ++-- arch/x86/kernel/kvm.c | 4 ++-- arch/x86/xen/enlighten_hvm.c | 4 ++-- arch/x86/xen/enlighten_pv.c | 4 ++-- drivers/hv/vmbus_drv.c | 2 +- drivers/input/mouse/vmmouse.c | 10 ++++------ drivers/misc/vmw_balloon.c | 2 +- 11 files changed, 40 insertions(+), 31 deletions(-) diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index a5db63f728a2..a0b86cf486e0 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -113,7 +113,7 @@ void hyperv_init(void) u64 guest_id; union hv_x64_msr_hypercall_contents hypercall_msr; - if (x86_hyper != &x86_hyper_ms_hyperv) + if (x86_hyper_type != X86_HYPER_MS_HYPERV) return; /* Allocate percpu VP index */ diff --git a/arch/x86/include/asm/hypervisor.h b/arch/x86/include/asm/hypervisor.h index 0eca7239a7aa..1b0a5abcd8ae 100644 --- a/arch/x86/include/asm/hypervisor.h +++ b/arch/x86/include/asm/hypervisor.h @@ -29,6 +29,16 @@ /* * x86 hypervisor information */ + +enum x86_hypervisor_type { + X86_HYPER_NATIVE = 0, + X86_HYPER_VMWARE, + X86_HYPER_MS_HYPERV, + X86_HYPER_XEN_PV, + X86_HYPER_XEN_HVM, + X86_HYPER_KVM, +}; + struct hypervisor_x86 { /* Hypervisor name */ const char *name; @@ -36,6 +46,9 @@ struct hypervisor_x86 { /* Detection routine */ uint32_t (*detect)(void); + /* Hypervisor type */ + enum x86_hypervisor_type type; + /* init time callbacks */ struct x86_hyper_init init; @@ -43,15 +56,7 @@ struct hypervisor_x86 { struct x86_hyper_runtime runtime; }; -extern const struct hypervisor_x86 *x86_hyper; - -/* Recognized hypervisors */ -extern const struct hypervisor_x86 x86_hyper_vmware; -extern const struct hypervisor_x86 x86_hyper_ms_hyperv; -extern const struct hypervisor_x86 x86_hyper_xen_pv; -extern const struct hypervisor_x86 x86_hyper_xen_hvm; -extern const struct hypervisor_x86 x86_hyper_kvm; - +extern enum x86_hypervisor_type x86_hyper_type; extern void init_hypervisor_platform(void); #else static inline void init_hypervisor_platform(void) { } diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c index 22226c1bf092..bea8d3e24f50 100644 --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -26,6 +26,12 @@ #include #include +extern const struct hypervisor_x86 x86_hyper_vmware; +extern const struct hypervisor_x86 x86_hyper_ms_hyperv; +extern const struct hypervisor_x86 x86_hyper_xen_pv; +extern const struct hypervisor_x86 x86_hyper_xen_hvm; +extern const struct hypervisor_x86 x86_hyper_kvm; + static const __initconst struct hypervisor_x86 * const hypervisors[] = { #ifdef CONFIG_XEN_PV @@ -41,8 +47,8 @@ static const __initconst struct hypervisor_x86 * const hypervisors[] = #endif }; -const struct hypervisor_x86 *x86_hyper; -EXPORT_SYMBOL(x86_hyper); +enum x86_hypervisor_type x86_hyper_type; +EXPORT_SYMBOL(x86_hyper_type); static inline const struct hypervisor_x86 * __init detect_hypervisor_vendor(void) @@ -87,6 +93,6 @@ void __init init_hypervisor_platform(void) copy_array(&h->init, &x86_init.hyper, sizeof(h->init)); copy_array(&h->runtime, &x86_platform.hyper, sizeof(h->runtime)); - x86_hyper = h; + x86_hyper_type = h->type; x86_init.hyper.init_platform(); } diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c index 6bb84d655e4b..85eb5fc180c8 100644 --- a/arch/x86/kernel/cpu/mshyperv.c +++ b/arch/x86/kernel/cpu/mshyperv.c @@ -254,9 +254,9 @@ static void __init ms_hyperv_init_platform(void) #endif } -const __refconst struct hypervisor_x86 x86_hyper_ms_hyperv = { +const __initconst struct hypervisor_x86 x86_hyper_ms_hyperv = { .name = "Microsoft Hyper-V", .detect = ms_hyperv_platform, + .type = X86_HYPER_MS_HYPERV, .init.init_platform = ms_hyperv_init_platform, }; -EXPORT_SYMBOL(x86_hyper_ms_hyperv); diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 4804c1d063c8..8e005329648b 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -205,10 +205,10 @@ static bool __init vmware_legacy_x2apic_available(void) (eax & (1 << VMWARE_PORT_CMD_LEGACY_X2APIC)) != 0; } -const __refconst struct hypervisor_x86 x86_hyper_vmware = { +const __initconst struct hypervisor_x86 x86_hyper_vmware = { .name = "VMware", .detect = vmware_platform, + .type = X86_HYPER_VMWARE, .init.init_platform = vmware_platform_setup, .init.x2apic_available = vmware_legacy_x2apic_available, }; -EXPORT_SYMBOL(x86_hyper_vmware); diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 9dca8437c795..a94de09edbed 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -544,12 +544,12 @@ static uint32_t __init kvm_detect(void) return kvm_cpuid_base(); } -const struct hypervisor_x86 x86_hyper_kvm __refconst = { +const __initconst struct hypervisor_x86 x86_hyper_kvm = { .name = "KVM", .detect = kvm_detect, + .type = X86_HYPER_KVM, .init.x2apic_available = kvm_para_available, }; -EXPORT_SYMBOL_GPL(x86_hyper_kvm); static __init int activate_jump_labels(void) { diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c index 7b1622089f96..754d5391d9fa 100644 --- a/arch/x86/xen/enlighten_hvm.c +++ b/arch/x86/xen/enlighten_hvm.c @@ -226,12 +226,12 @@ static uint32_t __init xen_platform_hvm(void) return xen_cpuid_base(); } -const struct hypervisor_x86 x86_hyper_xen_hvm = { +const __initconst struct hypervisor_x86 x86_hyper_xen_hvm = { .name = "Xen HVM", .detect = xen_platform_hvm, + .type = X86_HYPER_XEN_HVM, .init.init_platform = xen_hvm_guest_init, .init.x2apic_available = xen_x2apic_para_available, .init.init_mem_mapping = xen_hvm_init_mem_mapping, .runtime.pin_vcpu = xen_pin_vcpu, }; -EXPORT_SYMBOL(x86_hyper_xen_hvm); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 57d63b0f1c0c..fbd054d6ac97 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -1459,9 +1459,9 @@ static uint32_t __init xen_platform_pv(void) return 0; } -const struct hypervisor_x86 x86_hyper_xen_pv = { +const __initconst struct hypervisor_x86 x86_hyper_xen_pv = { .name = "Xen PV", .detect = xen_platform_pv, + .type = X86_HYPER_XEN_PV, .runtime.pin_vcpu = xen_pin_vcpu, }; -EXPORT_SYMBOL(x86_hyper_xen_pv); diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c index 937801ac2fe0..2cd134dd94d2 100644 --- a/drivers/hv/vmbus_drv.c +++ b/drivers/hv/vmbus_drv.c @@ -1534,7 +1534,7 @@ static int __init hv_acpi_init(void) { int ret, t; - if (x86_hyper != &x86_hyper_ms_hyperv) + if (x86_hyper_type != X86_HYPER_MS_HYPERV) return -ENODEV; init_completion(&probe_event); diff --git a/drivers/input/mouse/vmmouse.c b/drivers/input/mouse/vmmouse.c index 0f586780ceb4..1ae5c1ef3f5b 100644 --- a/drivers/input/mouse/vmmouse.c +++ b/drivers/input/mouse/vmmouse.c @@ -316,11 +316,9 @@ static int vmmouse_enable(struct psmouse *psmouse) /* * Array of supported hypervisors. */ -static const struct hypervisor_x86 *vmmouse_supported_hypervisors[] = { - &x86_hyper_vmware, -#ifdef CONFIG_KVM_GUEST - &x86_hyper_kvm, -#endif +static enum x86_hypervisor_type vmmouse_supported_hypervisors[] = { + X86_HYPER_VMWARE, + X86_HYPER_KVM, }; /** @@ -331,7 +329,7 @@ static bool vmmouse_check_hypervisor(void) int i; for (i = 0; i < ARRAY_SIZE(vmmouse_supported_hypervisors); i++) - if (vmmouse_supported_hypervisors[i] == x86_hyper) + if (vmmouse_supported_hypervisors[i] == x86_hyper_type) return true; return false; diff --git a/drivers/misc/vmw_balloon.c b/drivers/misc/vmw_balloon.c index 1e688bfec567..9047c0a529b2 100644 --- a/drivers/misc/vmw_balloon.c +++ b/drivers/misc/vmw_balloon.c @@ -1271,7 +1271,7 @@ static int __init vmballoon_init(void) * Check if we are running on VMware's hypervisor and bail out * if we are not. */ - if (x86_hyper != &x86_hyper_vmware) + if (x86_hyper_type != X86_HYPER_VMWARE) return -ENODEV; for (is_2m_pages = 0; is_2m_pages < VMW_BALLOON_NUM_PAGE_SIZES; From 330a4f53bbb4add9c998344af15c27afc9b1e825 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Tue, 5 Dec 2017 14:14:47 +0100 Subject: [PATCH 0620/1013] drivers/misc/intel/pti: Rename the header file to free up the namespace commit 1784f9144b143a1e8b19fe94083b040aa559182b upstream. We'd like to use the 'PTI' acronym for 'Page Table Isolation' - free up the namespace by renaming the driver header to . (Also standardize the header guard name while at it.) Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: J Freyensee Cc: Greg Kroah-Hartman Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- drivers/misc/pti.c | 2 +- include/linux/{pti.h => intel-pti.h} | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) rename include/linux/{pti.h => intel-pti.h} (94%) diff --git a/drivers/misc/pti.c b/drivers/misc/pti.c index eda38cbe8530..41f2a9f6851d 100644 --- a/drivers/misc/pti.c +++ b/drivers/misc/pti.c @@ -32,7 +32,7 @@ #include #include #include -#include +#include #include #include diff --git a/include/linux/pti.h b/include/linux/intel-pti.h similarity index 94% rename from include/linux/pti.h rename to include/linux/intel-pti.h index b3ea01a3197e..2710d72de3c9 100644 --- a/include/linux/pti.h +++ b/include/linux/intel-pti.h @@ -22,8 +22,8 @@ * interface to write out it's contents for debugging a mobile system. */ -#ifndef PTI_H_ -#define PTI_H_ +#ifndef LINUX_INTEL_PTI_H_ +#define LINUX_INTEL_PTI_H_ /* offset for last dword of any PTI message. Part of MIPI P1149.7 */ #define PTI_LASTDWORD_DTS 0x30 @@ -40,4 +40,4 @@ struct pti_masterchannel *pti_request_masterchannel(u8 type, const char *thread_name); void pti_release_masterchannel(struct pti_masterchannel *mc); -#endif /*PTI_H_*/ +#endif /* LINUX_INTEL_PTI_H_ */ From c6e38628af6d6e06d6b6038c00ed140323e3ba97 Mon Sep 17 00:00:00 2001 From: Ricardo Neri Date: Sun, 5 Nov 2017 18:27:51 -0800 Subject: [PATCH 0621/1013] x86/cpufeature: Add User-Mode Instruction Prevention definitions commit a8b4db562e7283a1520f9e9730297ecaab7622ea upstream. [ Note, this is a Git cherry-pick of the following commit: (limited to the cpufeatures.h file) 3522c2a6a4f3 ("x86/cpufeature: Add User-Mode Instruction Prevention definitions") ... for easier x86 PTI code testing and back-porting. ] User-Mode Instruction Prevention is a security feature present in new Intel processors that, when set, prevents the execution of a subset of instructions if such instructions are executed in user mode (CPL > 0). Attempting to execute such instructions causes a general protection exception. The subset of instructions comprises: * SGDT - Store Global Descriptor Table * SIDT - Store Interrupt Descriptor Table * SLDT - Store Local Descriptor Table * SMSW - Store Machine Status Word * STR - Store Task Register This feature is also added to the list of disabled-features to allow a cleaner handling of build-time configuration. Signed-off-by: Ricardo Neri Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Brian Gerst Cc: Chen Yucong Cc: Chris Metcalf Cc: Dave Hansen Cc: Denys Vlasenko Cc: Fenghua Yu Cc: H. Peter Anvin Cc: Huang Rui Cc: Jiri Slaby Cc: Jonathan Corbet Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Masami Hiramatsu Cc: Michael S. Tsirkin Cc: Paolo Bonzini Cc: Paul Gortmaker Cc: Peter Zijlstra Cc: Ravi V. Shankar Cc: Shuah Khan Cc: Tony Luck Cc: Vlastimil Babka Cc: ricardo.neri@intel.com Link: http://lkml.kernel.org/r/1509935277-22138-7-git-send-email-ricardo.neri-calderon@linux.intel.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index cdf5be866863..c0b0e9e8aa66 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -296,6 +296,7 @@ /* Intel-defined CPU features, CPUID level 0x00000007:0 (ECX), word 16 */ #define X86_FEATURE_AVX512VBMI (16*32+ 1) /* AVX512 Vector Bit Manipulation instructions*/ +#define X86_FEATURE_UMIP (16*32+ 2) /* User Mode Instruction Protection */ #define X86_FEATURE_PKU (16*32+ 3) /* Protection Keys for Userspace */ #define X86_FEATURE_OSPKE (16*32+ 4) /* OS Protection Keys Enable */ #define X86_FEATURE_AVX512_VBMI2 (16*32+ 6) /* Additional AVX512 Vector Bit Manipulation Instructions */ From e918424231eef4f6652c2221ba2f1d6b44e04233 Mon Sep 17 00:00:00 2001 From: Rudolf Marek Date: Tue, 28 Nov 2017 22:01:06 +0100 Subject: [PATCH 0622/1013] x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD commit f2dbad36c55e5d3a91dccbde6e8cae345fe5632f upstream. [ Note, this is a Git cherry-pick of the following commit: 2b67799bdf25 ("x86: Make X86_BUG_FXSAVE_LEAK detectable in CPUID on AMD") ... for easier x86 PTI code testing and back-porting. ] The latest AMD AMD64 Architecture Programmer's Manual adds a CPUID feature XSaveErPtr (CPUID_Fn80000008_EBX[2]). If this feature is set, the FXSAVE, XSAVE, FXSAVEOPT, XSAVEC, XSAVES / FXRSTOR, XRSTOR, XRSTORS always save/restore error pointers, thus making the X86_BUG_FXSAVE_LEAK workaround obsolete on such CPUs. Signed-Off-By: Rudolf Marek Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Tested-by: Borislav Petkov Cc: Andy Lutomirski Link: https://lkml.kernel.org/r/bdcebe90-62c5-1f05-083c-eba7f08b2540@assembler.cz Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/kernel/cpu/amd.c | 7 +++++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index c0b0e9e8aa66..800104c8a3ed 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -266,6 +266,7 @@ /* AMD-defined CPU features, CPUID level 0x80000008 (EBX), word 13 */ #define X86_FEATURE_CLZERO (13*32+ 0) /* CLZERO instruction */ #define X86_FEATURE_IRPERF (13*32+ 1) /* Instructions Retired Count */ +#define X86_FEATURE_XSAVEERPTR (13*32+ 2) /* Always save/restore FP error pointers */ /* Thermal and Power Management Leaf, CPUID level 0x00000006 (EAX), word 14 */ #define X86_FEATURE_DTHERM (14*32+ 0) /* Digital Thermal Sensor */ diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index d58184b7cd44..bcb75dc97d44 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -804,8 +804,11 @@ static void init_amd(struct cpuinfo_x86 *c) case 0x17: init_amd_zn(c); break; } - /* Enable workaround for FXSAVE leak */ - if (c->x86 >= 6) + /* + * Enable workaround for FXSAVE leak on CPUs + * without a XSaveErPtr feature + */ + if ((c->x86 >= 6) && (!cpu_has(c, X86_FEATURE_XSAVEERPTR))) set_cpu_bug(c, X86_BUG_FXSAVE_LEAK); cpu_detect_cache_sizes(c); From 2d8c24ed9310b89921bd0459d8df1258134015d4 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Thu, 31 Aug 2017 14:46:30 -0700 Subject: [PATCH 0623/1013] perf/x86: Enable free running PEBS for REGS_USER/INTR commit 2fe1bc1f501d55e5925b4035bcd85781adc76c63 upstream. [ Note, this is a Git cherry-pick of the following commit: a47ba4d77e12 ("perf/x86: Enable free running PEBS for REGS_USER/INTR") ... for easier x86 PTI code testing and back-porting. ] Currently free running PEBS is disabled when user or interrupt registers are requested. Most of the registers are actually available in the PEBS record and can be supported. So we just need to check for the supported registers and then allow it: it is all except for the segment register. For user registers this only works when the counter is limited to ring 3 only, so this also needs to be checked. Signed-off-by: Andi Kleen Signed-off-by: Peter Zijlstra (Intel) Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20170831214630.21892-1-andi@firstfloor.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/intel/core.c | 4 ++++ arch/x86/events/perf_event.h | 24 +++++++++++++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index f94855000d4e..09c26a4f139c 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -2958,6 +2958,10 @@ static unsigned long intel_pmu_free_running_flags(struct perf_event *event) if (event->attr.use_clockid) flags &= ~PERF_SAMPLE_TIME; + if (!event->attr.exclude_kernel) + flags &= ~PERF_SAMPLE_REGS_USER; + if (event->attr.sample_regs_user & ~PEBS_REGS) + flags &= ~(PERF_SAMPLE_REGS_USER | PERF_SAMPLE_REGS_INTR); return flags; } diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 4196f81ec0e1..f7aaadf9331f 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -85,13 +85,15 @@ struct amd_nb { * Flags PEBS can handle without an PMI. * * TID can only be handled by flushing at context switch. + * REGS_USER can be handled for events limited to ring 3. * */ #define PEBS_FREERUNNING_FLAGS \ (PERF_SAMPLE_IP | PERF_SAMPLE_TID | PERF_SAMPLE_ADDR | \ PERF_SAMPLE_ID | PERF_SAMPLE_CPU | PERF_SAMPLE_STREAM_ID | \ PERF_SAMPLE_DATA_SRC | PERF_SAMPLE_IDENTIFIER | \ - PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR) + PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | \ + PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER) /* * A debug store configuration. @@ -110,6 +112,26 @@ struct debug_store { u64 pebs_event_reset[MAX_PEBS_EVENTS]; }; +#define PEBS_REGS \ + (PERF_REG_X86_AX | \ + PERF_REG_X86_BX | \ + PERF_REG_X86_CX | \ + PERF_REG_X86_DX | \ + PERF_REG_X86_DI | \ + PERF_REG_X86_SI | \ + PERF_REG_X86_SP | \ + PERF_REG_X86_BP | \ + PERF_REG_X86_IP | \ + PERF_REG_X86_FLAGS | \ + PERF_REG_X86_R8 | \ + PERF_REG_X86_R9 | \ + PERF_REG_X86_R10 | \ + PERF_REG_X86_R11 | \ + PERF_REG_X86_R12 | \ + PERF_REG_X86_R13 | \ + PERF_REG_X86_R14 | \ + PERF_REG_X86_R15) + /* * Per register state. */ From 065060cdd3de1f7a1f08284999081c4c38601ef7 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Tue, 12 Dec 2017 02:25:31 +0100 Subject: [PATCH 0624/1013] bpf: fix build issues on um due to mising bpf_perf_event.h commit ab95477e7cb35557ecfc837687007b646bab9a9f upstream. [ Note, this is a Git cherry-pick of the following commit: a23f06f06dbe ("bpf: fix build issues on um due to mising bpf_perf_event.h") ... for easier x86 PTI code testing and back-porting. ] Since c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build on i386 or x86_64: [...] CC init/main.o In file included from ../include/linux/perf_event.h:18:0, from ../include/linux/trace_events.h:10, from ../include/trace/syscall.h:7, from ../include/linux/syscalls.h:82, from ../init/main.c:20: ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error: asm/bpf_perf_event.h: No such file or directory #include [...] Lets add missing bpf_perf_event.h also to um arch. This seems to be the only one still missing. Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type") Reported-by: Randy Dunlap Suggested-by: Richard Weinberger Signed-off-by: Daniel Borkmann Tested-by: Randy Dunlap Cc: Hendrik Brueckner Cc: Richard Weinberger Acked-by: Alexei Starovoitov Acked-by: Richard Weinberger Signed-off-by: Alexei Starovoitov Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/um/include/asm/Kbuild | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index 50a32c33d729..73c57f614c9e 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -1,4 +1,5 @@ generic-y += barrier.h +generic-y += bpf_perf_event.h generic-y += bug.h generic-y += clkdev.h generic-y += current.h From 1aedecaf12a67de999eaf233cdb64839170ca035 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 24 Oct 2017 11:22:47 +0100 Subject: [PATCH 0625/1013] locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE() commit c2bc66082e1048c7573d72e62f597bdc5ce13fea upstream. [ Note, this is a Git cherry-pick of the following commit: 76ebbe78f739 ("locking/barriers: Add implicit smp_read_barrier_depends() to READ_ONCE()") ... for easier x86 PTI code testing and back-porting. ] In preparation for the removal of lockless_dereference(), which is the same as READ_ONCE() on all architectures other than Alpha, add an implicit smp_read_barrier_depends() to READ_ONCE() so that it can be used to head dependency chains on all architectures. Signed-off-by: Will Deacon Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1508840570-22169-3-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- include/linux/compiler.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 202710420d6d..712cd8bb00b4 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -341,6 +341,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s __read_once_size(&(x), __u.__c, sizeof(x)); \ else \ __read_once_size_nocheck(&(x), __u.__c, sizeof(x)); \ + smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \ __u.__val; \ }) #define READ_ONCE(x) __READ_ONCE(x, 1) From 5383f45db38c0459f8d3d3f90f7f0a132ad104f3 Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 24 Oct 2017 11:22:48 +0100 Subject: [PATCH 0626/1013] locking/barriers: Convert users of lockless_dereference() to READ_ONCE() commit 3382290ed2d5e275429cef510ab21889d3ccd164 upstream. [ Note, this is a Git cherry-pick of the following commit: 506458efaf15 ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()") ... for easier x86 PTI code testing and back-porting. ] READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by: Will Deacon Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/core.c | 2 +- arch/x86/include/asm/mmu_context.h | 4 ++-- arch/x86/kernel/ldt.c | 2 +- drivers/md/dm-mpath.c | 20 ++++++++++---------- fs/dcache.c | 4 ++-- fs/overlayfs/ovl_entry.h | 2 +- fs/overlayfs/readdir.c | 2 +- include/linux/rculist.h | 4 ++-- include/linux/rcupdate.h | 4 ++-- kernel/events/core.c | 4 ++-- kernel/seccomp.c | 2 +- kernel/task_work.c | 2 +- mm/slab.h | 2 +- 13 files changed, 27 insertions(+), 27 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 80534d3c2480..589af1eec7c1 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2371,7 +2371,7 @@ static unsigned long get_segment_base(unsigned int segment) struct ldt_struct *ldt; /* IRQs are off, so this synchronizes with smp_store_release */ - ldt = lockless_dereference(current->active_mm->context.ldt); + ldt = READ_ONCE(current->active_mm->context.ldt); if (!ldt || idx >= ldt->nr_entries) return 0; diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 6699fc441644..6d16d15d09a0 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -73,8 +73,8 @@ static inline void load_mm_ldt(struct mm_struct *mm) #ifdef CONFIG_MODIFY_LDT_SYSCALL struct ldt_struct *ldt; - /* lockless_dereference synchronizes with smp_store_release */ - ldt = lockless_dereference(mm->context.ldt); + /* READ_ONCE synchronizes with smp_store_release */ + ldt = READ_ONCE(mm->context.ldt); /* * Any change to mm->context.ldt is followed by an IPI to all diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index ae5615b03def..1c1eae961340 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -103,7 +103,7 @@ static void finalize_ldt_struct(struct ldt_struct *ldt) static void install_ldt(struct mm_struct *current_mm, struct ldt_struct *ldt) { - /* Synchronizes with lockless_dereference in load_mm_ldt. */ + /* Synchronizes with READ_ONCE in load_mm_ldt. */ smp_store_release(¤t_mm->context.ldt, ldt); /* Activate the LDT for all CPUs using current_mm. */ diff --git a/drivers/md/dm-mpath.c b/drivers/md/dm-mpath.c index 35e82b14ded7..ddf0a4341ae2 100644 --- a/drivers/md/dm-mpath.c +++ b/drivers/md/dm-mpath.c @@ -366,7 +366,7 @@ static struct pgpath *choose_path_in_pg(struct multipath *m, pgpath = path_to_pgpath(path); - if (unlikely(lockless_dereference(m->current_pg) != pg)) { + if (unlikely(READ_ONCE(m->current_pg) != pg)) { /* Only update current_pgpath if pg changed */ spin_lock_irqsave(&m->lock, flags); m->current_pgpath = pgpath; @@ -390,7 +390,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes) } /* Were we instructed to switch PG? */ - if (lockless_dereference(m->next_pg)) { + if (READ_ONCE(m->next_pg)) { spin_lock_irqsave(&m->lock, flags); pg = m->next_pg; if (!pg) { @@ -406,7 +406,7 @@ static struct pgpath *choose_pgpath(struct multipath *m, size_t nr_bytes) /* Don't change PG until it has no remaining paths */ check_current_pg: - pg = lockless_dereference(m->current_pg); + pg = READ_ONCE(m->current_pg); if (pg) { pgpath = choose_path_in_pg(m, pg, nr_bytes); if (!IS_ERR_OR_NULL(pgpath)) @@ -473,7 +473,7 @@ static int multipath_clone_and_map(struct dm_target *ti, struct request *rq, struct request *clone; /* Do we need to select a new pgpath? */ - pgpath = lockless_dereference(m->current_pgpath); + pgpath = READ_ONCE(m->current_pgpath); if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags)) pgpath = choose_pgpath(m, nr_bytes); @@ -533,7 +533,7 @@ static int __multipath_map_bio(struct multipath *m, struct bio *bio, struct dm_m bool queue_io; /* Do we need to select a new pgpath? */ - pgpath = lockless_dereference(m->current_pgpath); + pgpath = READ_ONCE(m->current_pgpath); queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags); if (!pgpath || !queue_io) pgpath = choose_pgpath(m, nr_bytes); @@ -1802,7 +1802,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti, struct pgpath *current_pgpath; int r; - current_pgpath = lockless_dereference(m->current_pgpath); + current_pgpath = READ_ONCE(m->current_pgpath); if (!current_pgpath) current_pgpath = choose_pgpath(m, 0); @@ -1824,7 +1824,7 @@ static int multipath_prepare_ioctl(struct dm_target *ti, } if (r == -ENOTCONN) { - if (!lockless_dereference(m->current_pg)) { + if (!READ_ONCE(m->current_pg)) { /* Path status changed, redo selection */ (void) choose_pgpath(m, 0); } @@ -1893,9 +1893,9 @@ static int multipath_busy(struct dm_target *ti) return (m->queue_mode != DM_TYPE_MQ_REQUEST_BASED); /* Guess which priority_group will be used at next mapping time */ - pg = lockless_dereference(m->current_pg); - next_pg = lockless_dereference(m->next_pg); - if (unlikely(!lockless_dereference(m->current_pgpath) && next_pg)) + pg = READ_ONCE(m->current_pg); + next_pg = READ_ONCE(m->next_pg); + if (unlikely(!READ_ONCE(m->current_pgpath) && next_pg)) pg = next_pg; if (!pg) { diff --git a/fs/dcache.c b/fs/dcache.c index f90141387f01..34c852af215c 100644 --- a/fs/dcache.c +++ b/fs/dcache.c @@ -231,7 +231,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c { /* * Be careful about RCU walk racing with rename: - * use 'lockless_dereference' to fetch the name pointer. + * use 'READ_ONCE' to fetch the name pointer. * * NOTE! Even if a rename will mean that the length * was not loaded atomically, we don't care. The @@ -245,7 +245,7 @@ static inline int dentry_cmp(const struct dentry *dentry, const unsigned char *c * early because the data cannot match (there can * be no NUL in the ct/tcount data) */ - const unsigned char *cs = lockless_dereference(dentry->d_name.name); + const unsigned char *cs = READ_ONCE(dentry->d_name.name); return dentry_string_cmp(cs, ct, tcount); } diff --git a/fs/overlayfs/ovl_entry.h b/fs/overlayfs/ovl_entry.h index 25d9b5adcd42..36b49bd09264 100644 --- a/fs/overlayfs/ovl_entry.h +++ b/fs/overlayfs/ovl_entry.h @@ -77,5 +77,5 @@ static inline struct ovl_inode *OVL_I(struct inode *inode) static inline struct dentry *ovl_upperdentry_dereference(struct ovl_inode *oi) { - return lockless_dereference(oi->__upperdentry); + return READ_ONCE(oi->__upperdentry); } diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c index b2c7f33e08fc..d94a51dc4e32 100644 --- a/fs/overlayfs/readdir.c +++ b/fs/overlayfs/readdir.c @@ -757,7 +757,7 @@ static int ovl_dir_fsync(struct file *file, loff_t start, loff_t end, if (!od->is_upper && OVL_TYPE_UPPER(ovl_path_type(dentry))) { struct inode *inode = file_inode(file); - realfile = lockless_dereference(od->upperfile); + realfile = READ_ONCE(od->upperfile); if (!realfile) { struct path upperpath; diff --git a/include/linux/rculist.h b/include/linux/rculist.h index c2cdd45a880a..127f534fec94 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -275,7 +275,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, * primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock(). */ #define list_entry_rcu(ptr, type, member) \ - container_of(lockless_dereference(ptr), type, member) + container_of(READ_ONCE(ptr), type, member) /* * Where are list_empty_rcu() and list_first_entry_rcu()? @@ -368,7 +368,7 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, * example is when items are added to the list, but never deleted. */ #define list_entry_lockless(ptr, type, member) \ - container_of((typeof(ptr))lockless_dereference(ptr), type, member) + container_of((typeof(ptr))READ_ONCE(ptr), type, member) /** * list_for_each_entry_lockless - iterate over rcu list of given type diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 1a9f70d44af9..a6ddc42f87a5 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -346,7 +346,7 @@ static inline void rcu_preempt_sleep_check(void) { } #define __rcu_dereference_check(p, c, space) \ ({ \ /* Dependency order vs. p above. */ \ - typeof(*p) *________p1 = (typeof(*p) *__force)lockless_dereference(p); \ + typeof(*p) *________p1 = (typeof(*p) *__force)READ_ONCE(p); \ RCU_LOCKDEP_WARN(!(c), "suspicious rcu_dereference_check() usage"); \ rcu_dereference_sparse(p, space); \ ((typeof(*p) __force __kernel *)(________p1)); \ @@ -360,7 +360,7 @@ static inline void rcu_preempt_sleep_check(void) { } #define rcu_dereference_raw(p) \ ({ \ /* Dependency order vs. p above. */ \ - typeof(p) ________p1 = lockless_dereference(p); \ + typeof(p) ________p1 = READ_ONCE(p); \ ((typeof(*p) __force __kernel *)(________p1)); \ }) diff --git a/kernel/events/core.c b/kernel/events/core.c index 4f1d4bfc607a..24ebad5567b4 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4233,7 +4233,7 @@ static void perf_remove_from_owner(struct perf_event *event) * indeed free this event, otherwise we need to serialize on * owner->perf_event_mutex. */ - owner = lockless_dereference(event->owner); + owner = READ_ONCE(event->owner); if (owner) { /* * Since delayed_put_task_struct() also drops the last @@ -4330,7 +4330,7 @@ int perf_event_release_kernel(struct perf_event *event) * Cannot change, child events are not migrated, see the * comment with perf_event_ctx_lock_nested(). */ - ctx = lockless_dereference(child->ctx); + ctx = READ_ONCE(child->ctx); /* * Since child_mutex nests inside ctx::mutex, we must jump * through hoops. We start by grabbing a reference on the ctx. diff --git a/kernel/seccomp.c b/kernel/seccomp.c index 418a1c045933..5f0dfb2abb8d 100644 --- a/kernel/seccomp.c +++ b/kernel/seccomp.c @@ -190,7 +190,7 @@ static u32 seccomp_run_filters(const struct seccomp_data *sd, u32 ret = SECCOMP_RET_ALLOW; /* Make sure cross-thread synced filter points somewhere sane. */ struct seccomp_filter *f = - lockless_dereference(current->seccomp.filter); + READ_ONCE(current->seccomp.filter); /* Ensure unexpected behavior doesn't result in failing open. */ if (unlikely(WARN_ON(f == NULL))) diff --git a/kernel/task_work.c b/kernel/task_work.c index 5718b3ea202a..0fef395662a6 100644 --- a/kernel/task_work.c +++ b/kernel/task_work.c @@ -68,7 +68,7 @@ task_work_cancel(struct task_struct *task, task_work_func_t func) * we raced with task_work_run(), *pprev == NULL/exited. */ raw_spin_lock_irqsave(&task->pi_lock, flags); - while ((work = lockless_dereference(*pprev))) { + while ((work = READ_ONCE(*pprev))) { if (work->func != func) pprev = &work->next; else if (cmpxchg(pprev, work, work->next) == work) diff --git a/mm/slab.h b/mm/slab.h index 028cdc7df67e..86d7c7d860f9 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -259,7 +259,7 @@ cache_from_memcg_idx(struct kmem_cache *s, int idx) * memcg_caches issues a write barrier to match this (see * memcg_create_kmem_cache()). */ - cachep = lockless_dereference(arr->entries[idx]); + cachep = READ_ONCE(arr->entries[idx]); rcu_read_unlock(); return cachep; From d455b71e7393975c5c50c26bd2199f19c6eed80f Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Wed, 15 Nov 2017 17:36:35 -0800 Subject: [PATCH 0627/1013] x86/mm/kasan: Don't use vmemmap_populate() to initialize shadow commit 2aeb07365bcd489620f71390a7d2031cd4dfb83e upstream. [ Note, this is a Git cherry-pick of the following commit: d17a1d97dc20: ("x86/mm/kasan: don't use vmemmap_populate() to initialize shadow") ... for easier x86 PTI code testing and back-porting. ] The KASAN shadow is currently mapped using vmemmap_populate() since that provides a semi-convenient way to map pages into init_top_pgt. However, since that no longer zeroes the mapped pages, it is not suitable for KASAN, which requires zeroed shadow memory. Add kasan_populate_shadow() interface and use it instead of vmemmap_populate(). Besides, this allows us to take advantage of gigantic pages and use them to populate the shadow, which should save us some memory wasted on page tables and reduce TLB pressure. Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com Signed-off-by: Andrey Ryabinin Signed-off-by: Pavel Tatashin Cc: Andy Lutomirski Cc: Steven Sistare Cc: Daniel Jordan Cc: Bob Picco Cc: Michal Hocko Cc: Alexander Potapenko Cc: Ard Biesheuvel Cc: Catalin Marinas Cc: Christian Borntraeger Cc: David S. Miller Cc: Dmitry Vyukov Cc: Heiko Carstens Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Mark Rutland Cc: Matthew Wilcox Cc: Mel Gorman Cc: Michal Hocko Cc: Sam Ravnborg Cc: Thomas Gleixner Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/Kconfig | 2 +- arch/x86/mm/kasan_init_64.c | 143 ++++++++++++++++++++++++++++++++++-- 2 files changed, 137 insertions(+), 8 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 59d5b3fa88e6..48646160eb83 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -108,7 +108,7 @@ config X86 select HAVE_ARCH_AUDITSYSCALL select HAVE_ARCH_HUGE_VMAP if X86_64 || X86_PAE select HAVE_ARCH_JUMP_LABEL - select HAVE_ARCH_KASAN if X86_64 && SPARSEMEM_VMEMMAP + select HAVE_ARCH_KASAN if X86_64 select HAVE_ARCH_KGDB select HAVE_ARCH_KMEMCHECK select HAVE_ARCH_MMAP_RND_BITS if MMU diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 2b60dc6e64b1..99dfed6dfef8 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -4,12 +4,14 @@ #include #include #include +#include #include #include #include #include #include +#include #include #include #include @@ -18,7 +20,134 @@ extern struct range pfn_mapped[E820_MAX_ENTRIES]; static p4d_t tmp_p4d_table[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE); -static int __init map_range(struct range *range) +static __init void *early_alloc(size_t size, int nid) +{ + return memblock_virt_alloc_try_nid_nopanic(size, size, + __pa(MAX_DMA_ADDRESS), BOOTMEM_ALLOC_ACCESSIBLE, nid); +} + +static void __init kasan_populate_pmd(pmd_t *pmd, unsigned long addr, + unsigned long end, int nid) +{ + pte_t *pte; + + if (pmd_none(*pmd)) { + void *p; + + if (boot_cpu_has(X86_FEATURE_PSE) && + ((end - addr) == PMD_SIZE) && + IS_ALIGNED(addr, PMD_SIZE)) { + p = early_alloc(PMD_SIZE, nid); + if (p && pmd_set_huge(pmd, __pa(p), PAGE_KERNEL)) + return; + else if (p) + memblock_free(__pa(p), PMD_SIZE); + } + + p = early_alloc(PAGE_SIZE, nid); + pmd_populate_kernel(&init_mm, pmd, p); + } + + pte = pte_offset_kernel(pmd, addr); + do { + pte_t entry; + void *p; + + if (!pte_none(*pte)) + continue; + + p = early_alloc(PAGE_SIZE, nid); + entry = pfn_pte(PFN_DOWN(__pa(p)), PAGE_KERNEL); + set_pte_at(&init_mm, addr, pte, entry); + } while (pte++, addr += PAGE_SIZE, addr != end); +} + +static void __init kasan_populate_pud(pud_t *pud, unsigned long addr, + unsigned long end, int nid) +{ + pmd_t *pmd; + unsigned long next; + + if (pud_none(*pud)) { + void *p; + + if (boot_cpu_has(X86_FEATURE_GBPAGES) && + ((end - addr) == PUD_SIZE) && + IS_ALIGNED(addr, PUD_SIZE)) { + p = early_alloc(PUD_SIZE, nid); + if (p && pud_set_huge(pud, __pa(p), PAGE_KERNEL)) + return; + else if (p) + memblock_free(__pa(p), PUD_SIZE); + } + + p = early_alloc(PAGE_SIZE, nid); + pud_populate(&init_mm, pud, p); + } + + pmd = pmd_offset(pud, addr); + do { + next = pmd_addr_end(addr, end); + if (!pmd_large(*pmd)) + kasan_populate_pmd(pmd, addr, next, nid); + } while (pmd++, addr = next, addr != end); +} + +static void __init kasan_populate_p4d(p4d_t *p4d, unsigned long addr, + unsigned long end, int nid) +{ + pud_t *pud; + unsigned long next; + + if (p4d_none(*p4d)) { + void *p = early_alloc(PAGE_SIZE, nid); + + p4d_populate(&init_mm, p4d, p); + } + + pud = pud_offset(p4d, addr); + do { + next = pud_addr_end(addr, end); + if (!pud_large(*pud)) + kasan_populate_pud(pud, addr, next, nid); + } while (pud++, addr = next, addr != end); +} + +static void __init kasan_populate_pgd(pgd_t *pgd, unsigned long addr, + unsigned long end, int nid) +{ + void *p; + p4d_t *p4d; + unsigned long next; + + if (pgd_none(*pgd)) { + p = early_alloc(PAGE_SIZE, nid); + pgd_populate(&init_mm, pgd, p); + } + + p4d = p4d_offset(pgd, addr); + do { + next = p4d_addr_end(addr, end); + kasan_populate_p4d(p4d, addr, next, nid); + } while (p4d++, addr = next, addr != end); +} + +static void __init kasan_populate_shadow(unsigned long addr, unsigned long end, + int nid) +{ + pgd_t *pgd; + unsigned long next; + + addr = addr & PAGE_MASK; + end = round_up(end, PAGE_SIZE); + pgd = pgd_offset_k(addr); + do { + next = pgd_addr_end(addr, end); + kasan_populate_pgd(pgd, addr, next, nid); + } while (pgd++, addr = next, addr != end); +} + +static void __init map_range(struct range *range) { unsigned long start; unsigned long end; @@ -26,7 +155,7 @@ static int __init map_range(struct range *range) start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start)); end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end)); - return vmemmap_populate(start, end, NUMA_NO_NODE); + kasan_populate_shadow(start, end, early_pfn_to_nid(range->start)); } static void __init clear_pgds(unsigned long start, @@ -189,16 +318,16 @@ void __init kasan_init(void) if (pfn_mapped[i].end == 0) break; - if (map_range(&pfn_mapped[i])) - panic("kasan: unable to allocate shadow!"); + map_range(&pfn_mapped[i]); } + kasan_populate_zero_shadow( kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), kasan_mem_to_shadow((void *)__START_KERNEL_map)); - vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext), - (unsigned long)kasan_mem_to_shadow(_end), - NUMA_NO_NODE); + kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext), + (unsigned long)kasan_mem_to_shadow(_end), + early_pfn_to_nid(__pa(_stext))); kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END), (void *)KASAN_SHADOW_END); From 21ddc15fa82b7a2a7b6209437c4d137672370c86 Mon Sep 17 00:00:00 2001 From: Boris Ostrovsky Date: Mon, 4 Dec 2017 15:07:07 +0100 Subject: [PATCH 0628/1013] x86/entry/64/paravirt: Use paravirt-safe macro to access eflags commit e17f8234538d1ff708673f287a42457c4dee720d upstream. Commit 1d3e53e8624a ("x86/entry/64: Refactor IRQ stacks and make them NMI-safe") added DEBUG_ENTRY_ASSERT_IRQS_OFF macro that acceses eflags using 'pushfq' instruction when testing for IF bit. On PV Xen guests looking at IF flag directly will always see it set, resulting in 'ud2'. Introduce SAVE_FLAGS() macro that will use appropriate save_fl pv op when running paravirt. Signed-off-by: Boris Ostrovsky Signed-off-by: Thomas Gleixner Reviewed-by: Juergen Gross Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: xen-devel@lists.xenproject.org Link: https://lkml.kernel.org/r/20171204150604.899457242@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 7 ++++--- arch/x86/include/asm/irqflags.h | 3 +++ arch/x86/include/asm/paravirt.h | 9 +++++++++ arch/x86/kernel/asm-offsets_64.c | 3 +++ 4 files changed, 19 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 5063ed1214dd..162ce952d716 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -462,12 +462,13 @@ END(irq_entries_start) .macro DEBUG_ENTRY_ASSERT_IRQS_OFF #ifdef CONFIG_DEBUG_ENTRY - pushfq - testl $X86_EFLAGS_IF, (%rsp) + pushq %rax + SAVE_FLAGS(CLBR_RAX) + testl $X86_EFLAGS_IF, %eax jz .Lokay_\@ ud2 .Lokay_\@: - addq $8, %rsp + popq %rax #endif .endm diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index c8ef23f2c28f..89f08955fff7 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -142,6 +142,9 @@ static inline notrace unsigned long arch_local_irq_save(void) swapgs; \ sysretl +#ifdef CONFIG_DEBUG_ENTRY +#define SAVE_FLAGS(x) pushfq; popq %rax +#endif #else #define INTERRUPT_RETURN iret #define ENABLE_INTERRUPTS_SYSEXIT sti; sysexit diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 283efcaac8af..892df375b615 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -927,6 +927,15 @@ extern void default_banner(void); PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_usergs_sysret64), \ CLBR_NONE, \ jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_usergs_sysret64)) + +#ifdef CONFIG_DEBUG_ENTRY +#define SAVE_FLAGS(clobbers) \ + PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_save_fl), clobbers, \ + PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE); \ + call PARA_INDIRECT(pv_irq_ops+PV_IRQ_save_fl); \ + PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);) +#endif + #endif /* CONFIG_X86_32 */ #endif /* __ASSEMBLY__ */ diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c index 630212fa9b9d..e3a5175a444b 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -23,6 +23,9 @@ int main(void) #ifdef CONFIG_PARAVIRT OFFSET(PV_CPU_usergs_sysret64, pv_cpu_ops, usergs_sysret64); OFFSET(PV_CPU_swapgs, pv_cpu_ops, swapgs); +#ifdef CONFIG_DEBUG_ENTRY + OFFSET(PV_IRQ_save_fl, pv_irq_ops, save_fl); +#endif BLANK(); #endif From 40ddc692b5a11cf7c3f9677744676af617d54a24 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:08 +0100 Subject: [PATCH 0629/1013] x86/unwinder/orc: Dont bail on stack overflow commit d3a09104018cf2ad5973dfa8a9c138ef9f5015a3 upstream. If the stack overflows into a guard page and the ORC unwinder should work well: by construction, there can't be any meaningful data in the guard page because no writes to the guard page will have succeeded. But there is a bug that prevents unwinding from working correctly: if the starting register state has RSP pointing into a stack guard page, the ORC unwinder bails out immediately. Instead of bailing out immediately check whether the next page up is a valid check page and if so analyze that. As a result the ORC unwinder will start the unwind. Tested by intentionally overflowing the task stack. The result is an accurate call trace instead of a trace consisting purely of '?' entries. There are a few other bugs that are triggered if the unwinder encounters a stack overflow after the first step, but they are outside the scope of this fix. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150604.991389777@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/unwind_orc.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index a3f973b2c97a..ff8e1132b2ae 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -553,8 +553,18 @@ void __unwind_start(struct unwind_state *state, struct task_struct *task, } if (get_stack_info((unsigned long *)state->sp, state->task, - &state->stack_info, &state->stack_mask)) - return; + &state->stack_info, &state->stack_mask)) { + /* + * We weren't on a valid stack. It's possible that + * we overflowed a valid stack into a guard page. + * See if the next page up is valid so that we can + * generate some kind of backtrace if this happens. + */ + void *next_page = (void *)PAGE_ALIGN((unsigned long)state->sp); + if (get_stack_info(next_page, state->task, &state->stack_info, + &state->stack_mask)) + return; + } /* * The caller can provide the address of the first frame directly From 5209e8ac937261925c12db63a28cfaa033fa30ed Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 4 Dec 2017 15:07:09 +0100 Subject: [PATCH 0630/1013] x86/unwinder: Handle stack overflows more gracefully commit b02fcf9ba1211097754b286043cd87a8b4907e75 upstream. There are at least two unwinder bugs hindering the debugging of stack-overflow crashes: - It doesn't deal gracefully with the case where the stack overflows and the stack pointer itself isn't on a valid stack but the to-be-dereferenced data *is*. - The ORC oops dump code doesn't know how to print partial pt_regs, for the case where if we get an interrupt/exception in *early* entry code before the full pt_regs have been saved. Fix both issues. http://lkml.kernel.org/r/20171126024031.uxi4numpbjm5rlbr@treble Signed-off-by: Josh Poimboeuf Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.071425003@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/kdebug.h | 1 + arch/x86/include/asm/unwind.h | 7 ++++ arch/x86/kernel/dumpstack.c | 32 ++++++++++++--- arch/x86/kernel/process_64.c | 11 +++--- arch/x86/kernel/unwind_orc.c | 74 ++++++++++++----------------------- 5 files changed, 65 insertions(+), 60 deletions(-) diff --git a/arch/x86/include/asm/kdebug.h b/arch/x86/include/asm/kdebug.h index f86a8caa561e..395c9631e000 100644 --- a/arch/x86/include/asm/kdebug.h +++ b/arch/x86/include/asm/kdebug.h @@ -26,6 +26,7 @@ extern void die(const char *, struct pt_regs *,long); extern int __must_check __die(const char *, struct pt_regs *, long); extern void show_stack_regs(struct pt_regs *regs); extern void __show_regs(struct pt_regs *regs, int all); +extern void show_iret_regs(struct pt_regs *regs); extern unsigned long oops_begin(void); extern void oops_end(unsigned long, struct pt_regs *, int signr); diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index e9cc6fe1fc6f..c1688c2d0a12 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -7,6 +7,9 @@ #include #include +#define IRET_FRAME_OFFSET (offsetof(struct pt_regs, ip)) +#define IRET_FRAME_SIZE (sizeof(struct pt_regs) - IRET_FRAME_OFFSET) + struct unwind_state { struct stack_info stack_info; unsigned long stack_mask; @@ -52,6 +55,10 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, } #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER) +/* + * WARNING: The entire pt_regs may not be safe to dereference. In some cases, + * only the iret frame registers are accessible. Use with caution! + */ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) { if (unwind_done(state)) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index f13b4c00a5de..0bc95be5c638 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -50,6 +50,28 @@ static void printk_stack_address(unsigned long address, int reliable, printk("%s %s%pB\n", log_lvl, reliable ? "" : "? ", (void *)address); } +void show_iret_regs(struct pt_regs *regs) +{ + printk(KERN_DEFAULT "RIP: %04x:%pS\n", (int)regs->cs, (void *)regs->ip); + printk(KERN_DEFAULT "RSP: %04x:%016lx EFLAGS: %08lx", (int)regs->ss, + regs->sp, regs->flags); +} + +static void show_regs_safe(struct stack_info *info, struct pt_regs *regs) +{ + if (on_stack(info, regs, sizeof(*regs))) + __show_regs(regs, 0); + else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET, + IRET_FRAME_SIZE)) { + /* + * When an interrupt or exception occurs in entry code, the + * full pt_regs might not have been saved yet. In that case + * just print the iret frame. + */ + show_iret_regs(regs); + } +} + void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, unsigned long *stack, char *log_lvl) { @@ -94,8 +116,8 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, if (stack_name) printk("%s <%s>\n", log_lvl, stack_name); - if (regs && on_stack(&stack_info, regs, sizeof(*regs))) - __show_regs(regs, 0); + if (regs) + show_regs_safe(&stack_info, regs); /* * Scan the stack, printing any text addresses we find. At the @@ -119,7 +141,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, /* * Don't print regs->ip again if it was already printed - * by __show_regs() below. + * by show_regs_safe() below. */ if (regs && stack == ®s->ip) goto next; @@ -155,8 +177,8 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, /* if the frame has entry regs, print them */ regs = unwind_get_entry_regs(&state); - if (regs && on_stack(&stack_info, regs, sizeof(*regs))) - __show_regs(regs, 0); + if (regs) + show_regs_safe(&stack_info, regs); } if (stack_name) diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index eeeb34f85c25..01b119bebb68 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -69,9 +69,8 @@ void __show_regs(struct pt_regs *regs, int all) unsigned int fsindex, gsindex; unsigned int ds, cs, es; - printk(KERN_DEFAULT "RIP: %04lx:%pS\n", regs->cs, (void *)regs->ip); - printk(KERN_DEFAULT "RSP: %04lx:%016lx EFLAGS: %08lx", regs->ss, - regs->sp, regs->flags); + show_iret_regs(regs); + if (regs->orig_ax != -1) pr_cont(" ORIG_RAX: %016lx\n", regs->orig_ax); else @@ -88,6 +87,9 @@ void __show_regs(struct pt_regs *regs, int all) printk(KERN_DEFAULT "R13: %016lx R14: %016lx R15: %016lx\n", regs->r13, regs->r14, regs->r15); + if (!all) + return; + asm("movl %%ds,%0" : "=r" (ds)); asm("movl %%cs,%0" : "=r" (cs)); asm("movl %%es,%0" : "=r" (es)); @@ -98,9 +100,6 @@ void __show_regs(struct pt_regs *regs, int all) rdmsrl(MSR_GS_BASE, gs); rdmsrl(MSR_KERNEL_GS_BASE, shadowgs); - if (!all) - return; - cr0 = read_cr0(); cr2 = read_cr2(); cr3 = __read_cr3(); diff --git a/arch/x86/kernel/unwind_orc.c b/arch/x86/kernel/unwind_orc.c index ff8e1132b2ae..be86a865087a 100644 --- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -253,22 +253,15 @@ unsigned long *unwind_get_return_address_ptr(struct unwind_state *state) return NULL; } -static bool stack_access_ok(struct unwind_state *state, unsigned long addr, +static bool stack_access_ok(struct unwind_state *state, unsigned long _addr, size_t len) { struct stack_info *info = &state->stack_info; + void *addr = (void *)_addr; - /* - * If the address isn't on the current stack, switch to the next one. - * - * We may have to traverse multiple stacks to deal with the possibility - * that info->next_sp could point to an empty stack and the address - * could be on a subsequent stack. - */ - while (!on_stack(info, (void *)addr, len)) - if (get_stack_info(info->next_sp, state->task, info, - &state->stack_mask)) - return false; + if (!on_stack(info, addr, len) && + (get_stack_info(addr, state->task, info, &state->stack_mask))) + return false; return true; } @@ -283,42 +276,32 @@ static bool deref_stack_reg(struct unwind_state *state, unsigned long addr, return true; } -#define REGS_SIZE (sizeof(struct pt_regs)) -#define SP_OFFSET (offsetof(struct pt_regs, sp)) -#define IRET_REGS_SIZE (REGS_SIZE - offsetof(struct pt_regs, ip)) -#define IRET_SP_OFFSET (SP_OFFSET - offsetof(struct pt_regs, ip)) - static bool deref_stack_regs(struct unwind_state *state, unsigned long addr, - unsigned long *ip, unsigned long *sp, bool full) + unsigned long *ip, unsigned long *sp) { - size_t regs_size = full ? REGS_SIZE : IRET_REGS_SIZE; - size_t sp_offset = full ? SP_OFFSET : IRET_SP_OFFSET; - struct pt_regs *regs = (struct pt_regs *)(addr + regs_size - REGS_SIZE); - - if (IS_ENABLED(CONFIG_X86_64)) { - if (!stack_access_ok(state, addr, regs_size)) - return false; - - *ip = regs->ip; - *sp = regs->sp; + struct pt_regs *regs = (struct pt_regs *)addr; - return true; - } + /* x86-32 support will be more complicated due to the ®s->sp hack */ + BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_32)); - if (!stack_access_ok(state, addr, sp_offset)) + if (!stack_access_ok(state, addr, sizeof(struct pt_regs))) return false; *ip = regs->ip; + *sp = regs->sp; + return true; +} - if (user_mode(regs)) { - if (!stack_access_ok(state, addr + sp_offset, - REGS_SIZE - SP_OFFSET)) - return false; +static bool deref_stack_iret_regs(struct unwind_state *state, unsigned long addr, + unsigned long *ip, unsigned long *sp) +{ + struct pt_regs *regs = (void *)addr - IRET_FRAME_OFFSET; - *sp = regs->sp; - } else - *sp = (unsigned long)®s->sp; + if (!stack_access_ok(state, addr, IRET_FRAME_SIZE)) + return false; + *ip = regs->ip; + *sp = regs->sp; return true; } @@ -327,7 +310,6 @@ bool unwind_next_frame(struct unwind_state *state) unsigned long ip_p, sp, orig_ip, prev_sp = state->sp; enum stack_type prev_type = state->stack_info.type; struct orc_entry *orc; - struct pt_regs *ptregs; bool indirect = false; if (unwind_done(state)) @@ -435,7 +417,7 @@ bool unwind_next_frame(struct unwind_state *state) break; case ORC_TYPE_REGS: - if (!deref_stack_regs(state, sp, &state->ip, &state->sp, true)) { + if (!deref_stack_regs(state, sp, &state->ip, &state->sp)) { orc_warn("can't dereference registers at %p for ip %pB\n", (void *)sp, (void *)orig_ip); goto done; @@ -447,20 +429,14 @@ bool unwind_next_frame(struct unwind_state *state) break; case ORC_TYPE_REGS_IRET: - if (!deref_stack_regs(state, sp, &state->ip, &state->sp, false)) { + if (!deref_stack_iret_regs(state, sp, &state->ip, &state->sp)) { orc_warn("can't dereference iret registers at %p for ip %pB\n", (void *)sp, (void *)orig_ip); goto done; } - ptregs = container_of((void *)sp, struct pt_regs, ip); - if ((unsigned long)ptregs >= prev_sp && - on_stack(&state->stack_info, ptregs, REGS_SIZE)) { - state->regs = ptregs; - state->full_regs = false; - } else - state->regs = NULL; - + state->regs = (void *)sp - IRET_FRAME_OFFSET; + state->full_regs = false; state->signal = true; break; From 996d087af015da99632fde450092923d574d4bbc Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:10 +0100 Subject: [PATCH 0631/1013] x86/irq: Remove an old outdated comment about context tracking races commit 6669a692605547892a026445e460bf233958bd7f upstream. That race has been fixed and code cleaned up for a while now. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.150551639@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/irq.c | 12 ------------ 1 file changed, 12 deletions(-) diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 52089c043160..aa9d51eea9d0 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -219,18 +219,6 @@ __visible unsigned int __irq_entry do_IRQ(struct pt_regs *regs) /* high bit used in ret_from_ code */ unsigned vector = ~regs->orig_ax; - /* - * NB: Unlike exception entries, IRQ entries do not reliably - * handle context tracking in the low-level entry code. This is - * because syscall entries execute briefly with IRQs on before - * updating context tracking state, so we can take an IRQ from - * kernel mode with CONTEXT_USER. The low-level entry code only - * updates the context if we came from user mode, so we won't - * switch to CONTEXT_KERNEL. We'll fix that once the syscall - * code is cleaned up enough that we can cleanly defer enabling - * IRQs. - */ - entering_irq(); /* entering_irq() tells RCU that we're not quiescent. Check it. */ From e9b7b111e5be68e41e1984bcaa03535cb5e76141 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:11 +0100 Subject: [PATCH 0632/1013] x86/irq/64: Print the offending IP in the stack overflow warning commit 4f3789e792296e21405f708cf3cb409d7c7d5683 upstream. In case something goes wrong with unwind (not unlikely in case of overflow), print the offending IP where we detected the overflow. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.231677119@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/irq_64.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/irq_64.c b/arch/x86/kernel/irq_64.c index 020efbf5786b..d86e344f5b3d 100644 --- a/arch/x86/kernel/irq_64.c +++ b/arch/x86/kernel/irq_64.c @@ -57,10 +57,10 @@ static inline void stack_overflow_check(struct pt_regs *regs) if (regs->sp >= estack_top && regs->sp <= estack_bottom) return; - WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx)\n", + WARN_ONCE(1, "do_IRQ(): %s has overflown the kernel stack (cur:%Lx,sp:%lx,irq stk top-bottom:%Lx-%Lx,exception stk top-bottom:%Lx-%Lx,ip:%pF)\n", current->comm, curbase, regs->sp, irq_stack_top, irq_stack_bottom, - estack_top, estack_bottom); + estack_top, estack_bottom, (void *)regs->ip); if (sysctl_panic_on_stackoverflow) panic("low stack detected by irq handler - check messages\n"); From 9b654aba0360c10baa7cc8d15f976e7ca1af44ac Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:12 +0100 Subject: [PATCH 0633/1013] x86/entry/64: Allocate and enable the SYSENTER stack commit 1a79797b58cddfa948420a7553241c79c013e3ca upstream. This will simplify future changes that want scratch variables early in the SYSENTER handler -- they'll be able to spill registers to the stack. It also lets us get rid of a SWAPGS_UNSAFE_STACK user. This does not depend on CONFIG_IA32_EMULATION=y because we'll want the stack space even without IA32 emulation. As far as I can tell, the reason that this wasn't done from day 1 is that we use IST for #DB and #BP, which is IMO rather nasty and causes a lot more problems than it solves. But, since #DB uses IST, we don't actually need a real stack for SYSENTER (because SYSENTER with TF set will invoke #DB on the IST stack rather than the SYSENTER stack). I want to remove IST usage from these vectors some day, and this patch is a prerequisite for that as well. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.312726423@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64_compat.S | 2 +- arch/x86/include/asm/processor.h | 3 --- arch/x86/kernel/asm-offsets.c | 5 +++++ arch/x86/kernel/asm-offsets_32.c | 5 ----- arch/x86/kernel/cpu/common.c | 4 +++- arch/x86/kernel/process.c | 2 -- arch/x86/kernel/traps.c | 3 +-- 7 files changed, 10 insertions(+), 14 deletions(-) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 568e130d932c..dcc6987f9bae 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -48,7 +48,7 @@ */ ENTRY(entry_SYSENTER_compat) /* Interrupts are off on entry. */ - SWAPGS_UNSAFE_STACK + SWAPGS movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp /* diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 2db7cf720b04..789dad5da20f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -339,14 +339,11 @@ struct tss_struct { */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; -#ifdef CONFIG_X86_32 /* * Space for the temporary SYSENTER stack. */ unsigned long SYSENTER_stack_canary; unsigned long SYSENTER_stack[64]; -#endif - } ____cacheline_aligned; DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss); diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 8ea78275480d..b275863128eb 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -93,4 +93,9 @@ void common(void) { BLANK(); DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); + + /* Offset from cpu_tss to SYSENTER_stack */ + OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack); + /* Size of SYSENTER_stack */ + DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack)); } diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index dedf428b20b6..52ce4ea16e53 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -50,11 +50,6 @@ void foo(void) DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) - offsetofend(struct tss_struct, SYSENTER_stack)); - /* Offset from cpu_tss to SYSENTER_stack */ - OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack); - /* Size of SYSENTER_stack */ - DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack)); - #ifdef CONFIG_CC_STACKPROTECTOR BLANK(); OFFSET(stack_canary_offset, stack_canary, canary); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index cdf79ab628c2..22f542170198 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1361,7 +1361,9 @@ void syscall_init(void) * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit). */ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS); - wrmsrl_safe(MSR_IA32_SYSENTER_ESP, 0ULL); + wrmsrl_safe(MSR_IA32_SYSENTER_ESP, + (unsigned long)this_cpu_ptr(&cpu_tss) + + offsetofend(struct tss_struct, SYSENTER_stack)); wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat); #else wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 97fb3e5737f5..35d674157fda 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -71,9 +71,7 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { */ .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 }, #endif -#ifdef CONFIG_X86_32 .SYSENTER_stack_canary = STACK_END_MAGIC, -#endif }; EXPORT_PER_CPU_SYMBOL(cpu_tss); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index d366adfc61da..d3e3bbd5d3a0 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -794,14 +794,13 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) debug_stack_usage_dec(); exit: -#if defined(CONFIG_X86_32) /* * This is the most likely code path that involves non-trivial use * of the SYSENTER stack. Check that we haven't overrun it. */ WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC, "Overran or corrupted SYSENTER stack\n"); -#endif + ist_exit(regs); } NOKPROBE_SYMBOL(do_debug); From 2329da3fc03d19a8fc3bb212972b97e9f57abb32 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:13 +0100 Subject: [PATCH 0634/1013] x86/dumpstack: Add get_stack_info() support for the SYSENTER stack commit 33a2f1a6c4d7c0a02d1c006fb0379cc5ca3b96bb upstream. get_stack_info() doesn't currently know about the SYSENTER stack, so unwinding will fail if we entered the kernel on the SYSENTER stack and haven't fully switched off. Teach get_stack_info() about the SYSENTER stack. With future patches applied that run part of the entry code on the SYSENTER stack and introduce an intentional BUG(), I would get: PANIC: double fault, error_code: 0x0 ... RIP: 0010:do_error_trap+0x33/0x1c0 ... Call Trace: Code: ... With this patch, I get: PANIC: double fault, error_code: 0x0 ... Call Trace: ? async_page_fault+0x36/0x60 ? invalid_op+0x22/0x40 ? async_page_fault+0x36/0x60 ? sync_regs+0x3c/0x40 ? sync_regs+0x2e/0x40 ? error_entry+0x6c/0xd0 ? async_page_fault+0x36/0x60 Code: ... which is a lot more informative. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.392711508@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/stacktrace.h | 3 +++ arch/x86/kernel/dumpstack.c | 19 +++++++++++++++++++ arch/x86/kernel/dumpstack_32.c | 6 ++++++ arch/x86/kernel/dumpstack_64.c | 6 ++++++ 4 files changed, 34 insertions(+) diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h index 8da111b3c342..f8062bfd43a0 100644 --- a/arch/x86/include/asm/stacktrace.h +++ b/arch/x86/include/asm/stacktrace.h @@ -16,6 +16,7 @@ enum stack_type { STACK_TYPE_TASK, STACK_TYPE_IRQ, STACK_TYPE_SOFTIRQ, + STACK_TYPE_SYSENTER, STACK_TYPE_EXCEPTION, STACK_TYPE_EXCEPTION_LAST = STACK_TYPE_EXCEPTION + N_EXCEPTION_STACKS-1, }; @@ -28,6 +29,8 @@ struct stack_info { bool in_task_stack(unsigned long *stack, struct task_struct *task, struct stack_info *info); +bool in_sysenter_stack(unsigned long *stack, struct stack_info *info); + int get_stack_info(unsigned long *stack, struct task_struct *task, struct stack_info *info, unsigned long *visit_mask); diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 0bc95be5c638..a33a1373a252 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -43,6 +43,25 @@ bool in_task_stack(unsigned long *stack, struct task_struct *task, return true; } +bool in_sysenter_stack(unsigned long *stack, struct stack_info *info) +{ + struct tss_struct *tss = this_cpu_ptr(&cpu_tss); + + /* Treat the canary as part of the stack for unwinding purposes. */ + void *begin = &tss->SYSENTER_stack_canary; + void *end = (void *)&tss->SYSENTER_stack + sizeof(tss->SYSENTER_stack); + + if ((void *)stack < begin || (void *)stack >= end) + return false; + + info->type = STACK_TYPE_SYSENTER; + info->begin = begin; + info->end = end; + info->next_sp = NULL; + + return true; +} + static void printk_stack_address(unsigned long address, int reliable, char *log_lvl) { diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c index daefae83a3aa..5ff13a6b3680 100644 --- a/arch/x86/kernel/dumpstack_32.c +++ b/arch/x86/kernel/dumpstack_32.c @@ -26,6 +26,9 @@ const char *stack_type_name(enum stack_type type) if (type == STACK_TYPE_SOFTIRQ) return "SOFTIRQ"; + if (type == STACK_TYPE_SYSENTER) + return "SYSENTER"; + return NULL; } @@ -93,6 +96,9 @@ int get_stack_info(unsigned long *stack, struct task_struct *task, if (task != current) goto unknown; + if (in_sysenter_stack(stack, info)) + goto recursion_check; + if (in_hardirq_stack(stack, info)) goto recursion_check; diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c index 88ce2ffdb110..abc828f8c297 100644 --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -37,6 +37,9 @@ const char *stack_type_name(enum stack_type type) if (type == STACK_TYPE_IRQ) return "IRQ"; + if (type == STACK_TYPE_SYSENTER) + return "SYSENTER"; + if (type >= STACK_TYPE_EXCEPTION && type <= STACK_TYPE_EXCEPTION_LAST) return exception_stack_names[type - STACK_TYPE_EXCEPTION]; @@ -115,6 +118,9 @@ int get_stack_info(unsigned long *stack, struct task_struct *task, if (in_irq_stack(stack, info)) goto recursion_check; + if (in_sysenter_stack(stack, info)) + goto recursion_check; + goto unknown; recursion_check: From 5684dd300f675a9f5ad6a43f96f128839c98a7fd Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:14 +0100 Subject: [PATCH 0635/1013] x86/entry/gdt: Put per-CPU GDT remaps in ascending order commit aaeed3aeb39c1ba69f0a49baec8cb728121d0a91 upstream. We currently have CPU 0's GDT at the top of the GDT range and higher-numbered CPUs at lower addresses. This happens because the fixmap is upside down (index 0 is the top of the fixmap). Flip it so that GDTs are in ascending order by virtual address. This will simplify a future patch that will generalize the GDT remap to contain multiple pages. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.471561421@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/desc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 0a3e808b9123..01fd944fd721 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -63,7 +63,7 @@ static inline struct desc_struct *get_current_gdt_rw(void) /* Get the fixmap index for a specific processor */ static inline unsigned int get_cpu_gdt_ro_index(int cpu) { - return FIX_GDT_REMAP_BEGIN + cpu; + return FIX_GDT_REMAP_END - cpu; } /* Provide the fixmap address of the remapped GDT */ From ece614dcfd964ab04fb37186ea2c4b020dbc3ee9 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:15 +0100 Subject: [PATCH 0636/1013] x86/mm/fixmap: Generalize the GDT fixmap mechanism, introduce struct cpu_entry_area commit ef8813ab280507972bb57e4b1b502811ad4411e9 upstream. Currently, the GDT is an ad-hoc array of pages, one per CPU, in the fixmap. Generalize it to be an array of a new 'struct cpu_entry_area' so that we can cleanly add new things to it. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.563271721@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/desc.h | 9 +-------- arch/x86/include/asm/fixmap.h | 37 +++++++++++++++++++++++++++++++++-- arch/x86/kernel/cpu/common.c | 14 ++++++------- arch/x86/xen/mmu_pv.c | 2 +- 4 files changed, 44 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 01fd944fd721..f6f428432a68 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -60,17 +60,10 @@ static inline struct desc_struct *get_current_gdt_rw(void) return this_cpu_ptr(&gdt_page)->gdt; } -/* Get the fixmap index for a specific processor */ -static inline unsigned int get_cpu_gdt_ro_index(int cpu) -{ - return FIX_GDT_REMAP_END - cpu; -} - /* Provide the fixmap address of the remapped GDT */ static inline struct desc_struct *get_cpu_gdt_ro(int cpu) { - unsigned int idx = get_cpu_gdt_ro_index(cpu); - return (struct desc_struct *)__fix_to_virt(idx); + return (struct desc_struct *)&get_cpu_entry_area(cpu)->gdt; } /* Provide the current read-only GDT */ diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index b0c505fe9a95..b61f0242f9d0 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -44,6 +44,19 @@ extern unsigned long __FIXADDR_TOP; PAGE_SIZE) #endif +/* + * cpu_entry_area is a percpu region in the fixmap that contains things + * needed by the CPU and early entry/exit code. Real types aren't used + * for all fields here to avoid circular header dependencies. + * + * Every field is a virtual alias of some other allocated backing store. + * There is no direct allocation of a struct cpu_entry_area. + */ +struct cpu_entry_area { + char gdt[PAGE_SIZE]; +}; + +#define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE) /* * Here we define all the compile-time 'special' virtual @@ -101,8 +114,8 @@ enum fixed_addresses { FIX_LNW_VRTC, #endif /* Fixmap entries to remap the GDTs, one per processor. */ - FIX_GDT_REMAP_BEGIN, - FIX_GDT_REMAP_END = FIX_GDT_REMAP_BEGIN + NR_CPUS - 1, + FIX_CPU_ENTRY_AREA_TOP, + FIX_CPU_ENTRY_AREA_BOTTOM = FIX_CPU_ENTRY_AREA_TOP + (CPU_ENTRY_AREA_PAGES * NR_CPUS) - 1, #ifdef CONFIG_ACPI_APEI_GHES /* Used for GHES mapping from assorted contexts */ @@ -191,5 +204,25 @@ void __init *early_memremap_decrypted_wp(resource_size_t phys_addr, void __early_set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags); +static inline unsigned int __get_cpu_entry_area_page_index(int cpu, int page) +{ + BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0); + + return FIX_CPU_ENTRY_AREA_BOTTOM - cpu*CPU_ENTRY_AREA_PAGES - page; +} + +#define __get_cpu_entry_area_offset_index(cpu, offset) ({ \ + BUILD_BUG_ON(offset % PAGE_SIZE != 0); \ + __get_cpu_entry_area_page_index(cpu, offset / PAGE_SIZE); \ + }) + +#define get_cpu_entry_area_index(cpu, field) \ + __get_cpu_entry_area_offset_index((cpu), offsetof(struct cpu_entry_area, field)) + +static inline struct cpu_entry_area *get_cpu_entry_area(int cpu) +{ + return (struct cpu_entry_area *)__fix_to_virt(__get_cpu_entry_area_page_index(cpu, 0)); +} + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_FIXMAP_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 22f542170198..2cb394dc4153 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -466,12 +466,12 @@ void load_percpu_segment(int cpu) load_stack_canary_segment(); } -/* Setup the fixmap mapping only once per-processor */ -static inline void setup_fixmap_gdt(int cpu) +/* Setup the fixmap mappings only once per-processor */ +static inline void setup_cpu_entry_area(int cpu) { #ifdef CONFIG_X86_64 /* On 64-bit systems, we use a read-only fixmap GDT. */ - pgprot_t prot = PAGE_KERNEL_RO; + pgprot_t gdt_prot = PAGE_KERNEL_RO; #else /* * On native 32-bit systems, the GDT cannot be read-only because @@ -482,11 +482,11 @@ static inline void setup_fixmap_gdt(int cpu) * On Xen PV, the GDT must be read-only because the hypervisor requires * it. */ - pgprot_t prot = boot_cpu_has(X86_FEATURE_XENPV) ? + pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ? PAGE_KERNEL_RO : PAGE_KERNEL; #endif - __set_fixmap(get_cpu_gdt_ro_index(cpu), get_cpu_gdt_paddr(cpu), prot); + __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot); } /* Load the original GDT from the per-cpu structure */ @@ -1589,7 +1589,7 @@ void cpu_init(void) if (is_uv_system()) uv_cpu_init(); - setup_fixmap_gdt(cpu); + setup_cpu_entry_area(cpu); load_fixmap_gdt(cpu); } @@ -1651,7 +1651,7 @@ void cpu_init(void) fpu__init_cpu(); - setup_fixmap_gdt(cpu); + setup_cpu_entry_area(cpu); load_fixmap_gdt(cpu); } #endif diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2ccdaba31a07..c2454237fa67 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2272,7 +2272,7 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) #endif case FIX_TEXT_POKE0: case FIX_TEXT_POKE1: - case FIX_GDT_REMAP_BEGIN ... FIX_GDT_REMAP_END: + case FIX_CPU_ENTRY_AREA_TOP ... FIX_CPU_ENTRY_AREA_BOTTOM: /* All local page mappings */ pte = pfn_pte(phys, prot); break; From 487f3ddcd9863d38490ca4f6f2b9e67d4f8a197c Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:16 +0100 Subject: [PATCH 0637/1013] x86/kasan/64: Teach KASAN about the cpu_entry_area commit 21506525fb8ddb0342f2a2370812d47f6a1f3833 upstream. The cpu_entry_area will contain stacks. Make sure that KASAN has appropriate shadow mappings for them. Signed-off-by: Andy Lutomirski Signed-off-by: Andrey Ryabinin Signed-off-by: Thomas Gleixner Cc: Alexander Potapenko Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Dmitry Vyukov Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: kasan-dev@googlegroups.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.642806442@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/kasan_init_64.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 99dfed6dfef8..9ec70d780f1f 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -277,6 +277,7 @@ void __init kasan_early_init(void) void __init kasan_init(void) { int i; + void *shadow_cpu_entry_begin, *shadow_cpu_entry_end; #ifdef CONFIG_KASAN_INLINE register_die_notifier(&kasan_die_notifier); @@ -329,8 +330,23 @@ void __init kasan_init(void) (unsigned long)kasan_mem_to_shadow(_end), early_pfn_to_nid(__pa(_stext))); + shadow_cpu_entry_begin = (void *)__fix_to_virt(FIX_CPU_ENTRY_AREA_BOTTOM); + shadow_cpu_entry_begin = kasan_mem_to_shadow(shadow_cpu_entry_begin); + shadow_cpu_entry_begin = (void *)round_down((unsigned long)shadow_cpu_entry_begin, + PAGE_SIZE); + + shadow_cpu_entry_end = (void *)(__fix_to_virt(FIX_CPU_ENTRY_AREA_TOP) + PAGE_SIZE); + shadow_cpu_entry_end = kasan_mem_to_shadow(shadow_cpu_entry_end); + shadow_cpu_entry_end = (void *)round_up((unsigned long)shadow_cpu_entry_end, + PAGE_SIZE); + kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END), - (void *)KASAN_SHADOW_END); + shadow_cpu_entry_begin); + + kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin, + (unsigned long)shadow_cpu_entry_end, 0); + + kasan_populate_zero_shadow(shadow_cpu_entry_end, (void *)KASAN_SHADOW_END); load_cr3(init_top_pgt); __flush_tlb_all(); From 5be136953f62c551c1f40ddbb83c2f4e8fc9c41f Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:17 +0100 Subject: [PATCH 0638/1013] x86/entry: Fix assumptions that the HW TSS is at the beginning of cpu_tss commit 7fb983b4dd569e08564134a850dfd4eb1c63d9b8 upstream. A future patch will move SYSENTER_stack to the beginning of cpu_tss to help detect overflow. Before this can happen, fix several code paths that hardcode assumptions about the old layout. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Dave Hansen Reviewed-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.722425540@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/desc.h | 2 +- arch/x86/include/asm/processor.h | 9 ++++++-- arch/x86/kernel/cpu/common.c | 8 +++---- arch/x86/kernel/doublefault.c | 36 +++++++++++++++----------------- arch/x86/kvm/vmx.c | 2 +- arch/x86/power/cpu.c | 13 ++++++------ 6 files changed, 37 insertions(+), 33 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index f6f428432a68..2ace1f90d138 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -178,7 +178,7 @@ static inline void set_tssldt_descriptor(void *d, unsigned long addr, #endif } -static inline void __set_tss_desc(unsigned cpu, unsigned int entry, void *addr) +static inline void __set_tss_desc(unsigned cpu, unsigned int entry, struct x86_hw_tss *addr) { struct desc_struct *d = get_cpu_gdt_rw(cpu); tss_desc tss; diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 789dad5da20f..555c9478f3df 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -162,7 +162,7 @@ enum cpuid_regs_idx { extern struct cpuinfo_x86 boot_cpu_data; extern struct cpuinfo_x86 new_cpu_data; -extern struct tss_struct doublefault_tss; +extern struct x86_hw_tss doublefault_tss; extern __u32 cpu_caps_cleared[NCAPINTS]; extern __u32 cpu_caps_set[NCAPINTS]; @@ -252,6 +252,11 @@ static inline void load_cr3(pgd_t *pgdir) write_cr3(__sme_pa(pgdir)); } +/* + * Note that while the legacy 'TSS' name comes from 'Task State Segment', + * on modern x86 CPUs the TSS also holds information important to 64-bit mode, + * unrelated to the task-switch mechanism: + */ #ifdef CONFIG_X86_32 /* This is the TSS defined by the hardware. */ struct x86_hw_tss { @@ -322,7 +327,7 @@ struct x86_hw_tss { #define IO_BITMAP_BITS 65536 #define IO_BITMAP_BYTES (IO_BITMAP_BITS/8) #define IO_BITMAP_LONGS (IO_BITMAP_BYTES/sizeof(long)) -#define IO_BITMAP_OFFSET offsetof(struct tss_struct, io_bitmap) +#define IO_BITMAP_OFFSET (offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss)) #define INVALID_IO_BITMAP_OFFSET 0x8000 struct tss_struct { diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 2cb394dc4153..3f285b973f50 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1557,7 +1557,7 @@ void cpu_init(void) } } - t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap); + t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET; /* * <= is required because the CPU will access up to @@ -1576,7 +1576,7 @@ void cpu_init(void) * Initialize the TSS. Don't bother initializing sp0, as the initial * task never enters user mode. */ - set_tss_desc(cpu, t); + set_tss_desc(cpu, &t->x86_tss); load_TR_desc(); load_mm_ldt(&init_mm); @@ -1634,12 +1634,12 @@ void cpu_init(void) * Initialize the TSS. Don't bother initializing sp0, as the initial * task never enters user mode. */ - set_tss_desc(cpu, t); + set_tss_desc(cpu, &t->x86_tss); load_TR_desc(); load_mm_ldt(&init_mm); - t->x86_tss.io_bitmap_base = offsetof(struct tss_struct, io_bitmap); + t->x86_tss.io_bitmap_base = IO_BITMAP_OFFSET; #ifdef CONFIG_DOUBLEFAULT /* Set up doublefault TSS pointer in the GDT */ diff --git a/arch/x86/kernel/doublefault.c b/arch/x86/kernel/doublefault.c index 0e662c55ae90..0b8cedb20d6d 100644 --- a/arch/x86/kernel/doublefault.c +++ b/arch/x86/kernel/doublefault.c @@ -50,25 +50,23 @@ static void doublefault_fn(void) cpu_relax(); } -struct tss_struct doublefault_tss __cacheline_aligned = { - .x86_tss = { - .sp0 = STACK_START, - .ss0 = __KERNEL_DS, - .ldt = 0, - .io_bitmap_base = INVALID_IO_BITMAP_OFFSET, - - .ip = (unsigned long) doublefault_fn, - /* 0x2 bit is always set */ - .flags = X86_EFLAGS_SF | 0x2, - .sp = STACK_START, - .es = __USER_DS, - .cs = __KERNEL_CS, - .ss = __KERNEL_DS, - .ds = __USER_DS, - .fs = __KERNEL_PERCPU, - - .__cr3 = __pa_nodebug(swapper_pg_dir), - } +struct x86_hw_tss doublefault_tss __cacheline_aligned = { + .sp0 = STACK_START, + .ss0 = __KERNEL_DS, + .ldt = 0, + .io_bitmap_base = INVALID_IO_BITMAP_OFFSET, + + .ip = (unsigned long) doublefault_fn, + /* 0x2 bit is always set */ + .flags = X86_EFLAGS_SF | 0x2, + .sp = STACK_START, + .es = __USER_DS, + .cs = __KERNEL_CS, + .ss = __KERNEL_DS, + .ds = __USER_DS, + .fs = __KERNEL_PERCPU, + + .__cr3 = __pa_nodebug(swapper_pg_dir), }; /* dummy for do_double_fault() call */ diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index bc5921c1e2f2..b50b02d23f8a 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2295,7 +2295,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) * processors. See 22.2.4. */ vmcs_writel(HOST_TR_BASE, - (unsigned long)this_cpu_ptr(&cpu_tss)); + (unsigned long)this_cpu_ptr(&cpu_tss.x86_tss)); vmcs_writel(HOST_GDTR_BASE, (unsigned long)gdt); /* 22.2.4 */ /* diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 84fcfde53f8f..50593e138281 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -165,12 +165,13 @@ static void fix_processor_context(void) struct desc_struct *desc = get_cpu_gdt_rw(cpu); tss_desc tss; #endif - set_tss_desc(cpu, t); /* - * This just modifies memory; should not be - * necessary. But... This is necessary, because - * 386 hardware has concept of busy TSS or some - * similar stupidity. - */ + + /* + * This just modifies memory; should not be necessary. But... This is + * necessary, because 386 hardware has concept of busy TSS or some + * similar stupidity. + */ + set_tss_desc(cpu, &t->x86_tss); #ifdef CONFIG_X86_64 memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc)); From 41964ef17cce3f2497c1ec428d22b42fb615da8d Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:18 +0100 Subject: [PATCH 0639/1013] x86/dumpstack: Handle stack overflow on all stacks commit 6e60e583426c2f8751c22c2dfe5c207083b4483a upstream. We currently special-case stack overflow on the task stack. We're going to start putting special stacks in the fixmap with a custom layout, so they'll have guard pages, too. Teach the unwinder to be able to unwind an overflow of any of the stacks. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.802057305@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/dumpstack.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index a33a1373a252..64f8ed2a4827 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -112,24 +112,28 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, * - task stack * - interrupt stack * - HW exception stacks (double fault, nmi, debug, mce) + * - SYSENTER stack * - * x86-32 can have up to three stacks: + * x86-32 can have up to four stacks: * - task stack * - softirq stack * - hardirq stack + * - SYSENTER stack */ for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { const char *stack_name; - /* - * If we overflowed the task stack into a guard page, jump back - * to the bottom of the usable stack. - */ - if (task_stack_page(task) - (void *)stack < PAGE_SIZE) - stack = task_stack_page(task); - - if (get_stack_info(stack, task, &stack_info, &visit_mask)) - break; + if (get_stack_info(stack, task, &stack_info, &visit_mask)) { + /* + * We weren't on a valid stack. It's possible that + * we overflowed a valid stack into a guard page. + * See if the next page up is valid so that we can + * generate some kind of backtrace if this happens. + */ + stack = (unsigned long *)PAGE_ALIGN((unsigned long)stack); + if (get_stack_info(stack, task, &stack_info, &visit_mask)) + break; + } stack_name = stack_type_name(stack_info.type); if (stack_name) From 969f5706f6f51a4a01a0dfedfa405b33e25fb708 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:19 +0100 Subject: [PATCH 0640/1013] x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct commit 1a935bc3d4ea61556461a9e92a68ca3556232efd upstream. SYSENTER_stack should have reliable overflow detection, which means that it needs to be at the bottom of a page, not the top. Move it to the beginning of struct tss_struct and page-align it. Also add an assertion to make sure that the fixed hardware TSS doesn't cross a page boundary. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.881827433@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 21 ++++++++++++--------- arch/x86/kernel/cpu/common.c | 21 +++++++++++++++++++++ 2 files changed, 33 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 555c9478f3df..759051251664 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -332,7 +332,16 @@ struct x86_hw_tss { struct tss_struct { /* - * The hardware state: + * Space for the temporary SYSENTER stack, used for SYSENTER + * and the entry trampoline as well. + */ + unsigned long SYSENTER_stack_canary; + unsigned long SYSENTER_stack[64]; + + /* + * The fixed hardware portion. This must not cross a page boundary + * at risk of violating the SDM's advice and potentially triggering + * errata. */ struct x86_hw_tss x86_tss; @@ -343,15 +352,9 @@ struct tss_struct { * be within the limit. */ unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; +} __aligned(PAGE_SIZE); - /* - * Space for the temporary SYSENTER stack. - */ - unsigned long SYSENTER_stack_canary; - unsigned long SYSENTER_stack[64]; -} ____cacheline_aligned; - -DECLARE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss); +DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss); /* * sizeof(unsigned long) coming from an extra "long" at the end diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 3f285b973f50..60b2dfd2a58b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -487,6 +487,27 @@ static inline void setup_cpu_entry_area(int cpu) #endif __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot); + + /* + * The Intel SDM says (Volume 3, 7.2.1): + * + * Avoid placing a page boundary in the part of the TSS that the + * processor reads during a task switch (the first 104 bytes). The + * processor may not correctly perform address translations if a + * boundary occurs in this area. During a task switch, the processor + * reads and writes into the first 104 bytes of each TSS (using + * contiguous physical addresses beginning with the physical address + * of the first byte of the TSS). So, after TSS access begins, if + * part of the 104 bytes is not physically contiguous, the processor + * will access incorrect information without generating a page-fault + * exception. + * + * There are also a lot of errata involving the TSS spanning a page + * boundary. Assert that we're not doing that. + */ + BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^ + offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK); + } /* Load the original GDT from the per-cpu structure */ From 5bb40c6d4c2a453b21e8e7c2b9bae87e77c4d878 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:20 +0100 Subject: [PATCH 0641/1013] x86/entry: Remap the TSS into the CPU entry area commit 72f5e08dbba2d01aa90b592cf76c378ea233b00b upstream. This has a secondary purpose: it puts the entry stack into a region with a well-controlled layout. A subsequent patch will take advantage of this to streamline the SYSCALL entry code to be able to find it more easily. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150605.962042855@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_32.S | 6 +++-- arch/x86/include/asm/fixmap.h | 7 ++++++ arch/x86/kernel/asm-offsets.c | 3 +++ arch/x86/kernel/cpu/common.c | 41 ++++++++++++++++++++++++++++++----- arch/x86/kernel/dumpstack.c | 3 ++- arch/x86/kvm/vmx.c | 2 +- arch/x86/power/cpu.c | 11 +++++----- 7 files changed, 58 insertions(+), 15 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 4838037f97f6..0ab316c46806 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -941,7 +941,8 @@ ENTRY(debug) movl %esp, %eax # pt_regs pointer /* Are we currently on the SYSENTER stack? */ - PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx) + movl PER_CPU_VAR(cpu_entry_area), %ecx + addl $CPU_ENTRY_AREA_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ cmpl $SIZEOF_SYSENTER_stack, %ecx jb .Ldebug_from_sysenter_stack @@ -984,7 +985,8 @@ ENTRY(nmi) movl %esp, %eax # pt_regs pointer /* Are we currently on the SYSENTER stack? */ - PER_CPU(cpu_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx) + movl PER_CPU_VAR(cpu_entry_area), %ecx + addl $CPU_ENTRY_AREA_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ cmpl $SIZEOF_SYSENTER_stack, %ecx jb .Lnmi_from_sysenter_stack diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index b61f0242f9d0..84558b611ad3 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -54,6 +54,13 @@ extern unsigned long __FIXADDR_TOP; */ struct cpu_entry_area { char gdt[PAGE_SIZE]; + + /* + * The GDT is just below cpu_tss and thus serves (on x86_64) as a + * a read-only guard page for the SYSENTER stack at the bottom + * of the TSS region. + */ + struct tss_struct tss; }; #define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE) diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index b275863128eb..55858b277cf6 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -98,4 +98,7 @@ void common(void) { OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack); /* Size of SYSENTER_stack */ DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack)); + + /* Layout info for cpu_entry_area */ + OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss); } diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 60b2dfd2a58b..e5837bd6c672 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -466,6 +466,22 @@ void load_percpu_segment(int cpu) load_stack_canary_segment(); } +static void set_percpu_fixmap_pages(int fixmap_index, void *ptr, + int pages, pgprot_t prot) +{ + int i; + + for (i = 0; i < pages; i++) { + __set_fixmap(fixmap_index - i, + per_cpu_ptr_to_phys(ptr + i * PAGE_SIZE), prot); + } +} + +#ifdef CONFIG_X86_32 +/* The 32-bit entry code needs to find cpu_entry_area. */ +DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); +#endif + /* Setup the fixmap mappings only once per-processor */ static inline void setup_cpu_entry_area(int cpu) { @@ -507,7 +523,15 @@ static inline void setup_cpu_entry_area(int cpu) */ BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^ offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK); + BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0); + set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss), + &per_cpu(cpu_tss, cpu), + sizeof(struct tss_struct) / PAGE_SIZE, + PAGE_KERNEL); +#ifdef CONFIG_X86_32 + this_cpu_write(cpu_entry_area, get_cpu_entry_area(cpu)); +#endif } /* Load the original GDT from the per-cpu structure */ @@ -1257,7 +1281,8 @@ void enable_sep_cpu(void) wrmsr(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1, 0); wrmsr(MSR_IA32_SYSENTER_ESP, - (unsigned long)tss + offsetofend(struct tss_struct, SYSENTER_stack), + (unsigned long)&get_cpu_entry_area(cpu)->tss + + offsetofend(struct tss_struct, SYSENTER_stack), 0); wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0); @@ -1370,6 +1395,8 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks /* May not be marked __init: used by software suspend */ void syscall_init(void) { + int cpu = smp_processor_id(); + wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS); wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64); @@ -1383,7 +1410,7 @@ void syscall_init(void) */ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS); wrmsrl_safe(MSR_IA32_SYSENTER_ESP, - (unsigned long)this_cpu_ptr(&cpu_tss) + + (unsigned long)&get_cpu_entry_area(cpu)->tss + offsetofend(struct tss_struct, SYSENTER_stack)); wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat); #else @@ -1593,11 +1620,13 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, me); + setup_cpu_entry_area(cpu); + /* * Initialize the TSS. Don't bother initializing sp0, as the initial * task never enters user mode. */ - set_tss_desc(cpu, &t->x86_tss); + set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); load_TR_desc(); load_mm_ldt(&init_mm); @@ -1610,7 +1639,6 @@ void cpu_init(void) if (is_uv_system()) uv_cpu_init(); - setup_cpu_entry_area(cpu); load_fixmap_gdt(cpu); } @@ -1651,11 +1679,13 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, curr); + setup_cpu_entry_area(cpu); + /* * Initialize the TSS. Don't bother initializing sp0, as the initial * task never enters user mode. */ - set_tss_desc(cpu, &t->x86_tss); + set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); load_TR_desc(); load_mm_ldt(&init_mm); @@ -1672,7 +1702,6 @@ void cpu_init(void) fpu__init_cpu(); - setup_cpu_entry_area(cpu); load_fixmap_gdt(cpu); } #endif diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 64f8ed2a4827..60267850125e 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -45,7 +45,8 @@ bool in_task_stack(unsigned long *stack, struct task_struct *task, bool in_sysenter_stack(unsigned long *stack, struct stack_info *info) { - struct tss_struct *tss = this_cpu_ptr(&cpu_tss); + int cpu = smp_processor_id(); + struct tss_struct *tss = &get_cpu_entry_area(cpu)->tss; /* Treat the canary as part of the stack for unwinding purposes. */ void *begin = &tss->SYSENTER_stack_canary; diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b50b02d23f8a..47d9432756f3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -2295,7 +2295,7 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) * processors. See 22.2.4. */ vmcs_writel(HOST_TR_BASE, - (unsigned long)this_cpu_ptr(&cpu_tss.x86_tss)); + (unsigned long)&get_cpu_entry_area(cpu)->tss.x86_tss); vmcs_writel(HOST_GDTR_BASE, (unsigned long)gdt); /* 22.2.4 */ /* diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 50593e138281..04d5157fe7f8 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -160,18 +160,19 @@ static void do_fpu_end(void) static void fix_processor_context(void) { int cpu = smp_processor_id(); - struct tss_struct *t = &per_cpu(cpu_tss, cpu); #ifdef CONFIG_X86_64 struct desc_struct *desc = get_cpu_gdt_rw(cpu); tss_desc tss; #endif /* - * This just modifies memory; should not be necessary. But... This is - * necessary, because 386 hardware has concept of busy TSS or some - * similar stupidity. + * We need to reload TR, which requires that we change the + * GDT entry to indicate "available" first. + * + * XXX: This could probably all be replaced by a call to + * force_reload_TR(). */ - set_tss_desc(cpu, &t->x86_tss); + set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); #ifdef CONFIG_X86_64 memcpy(&tss, &desc[GDT_ENTRY_TSS], sizeof(tss_desc)); From d120cd749ef9770ee98b708a83b49547dcf1c0e1 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:21 +0100 Subject: [PATCH 0642/1013] x86/entry/64: Separate cpu_current_top_of_stack from TSS.sp0 commit 9aaefe7b59ae00605256a7d6bd1c1456432495fc upstream. On 64-bit kernels, we used to assume that TSS.sp0 was the current top of stack. With the addition of an entry trampoline, this will no longer be the case. Store the current top of stack in TSS.sp1, which is otherwise unused but shares the same cacheline. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.050864668@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 18 +++++++++++++----- arch/x86/include/asm/thread_info.h | 2 +- arch/x86/kernel/asm-offsets_64.c | 1 + arch/x86/kernel/process.c | 10 ++++++++++ arch/x86/kernel/process_64.c | 1 + 5 files changed, 26 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 759051251664..b0cf0612a454 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -309,7 +309,13 @@ struct x86_hw_tss { struct x86_hw_tss { u32 reserved1; u64 sp0; + + /* + * We store cpu_current_top_of_stack in sp1 so it's always accessible. + * Linux does not use ring 1, so sp1 is not otherwise needed. + */ u64 sp1; + u64 sp2; u64 reserved2; u64 ist[7]; @@ -368,6 +374,8 @@ DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss); #ifdef CONFIG_X86_32 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack); +#else +#define cpu_current_top_of_stack cpu_tss.x86_tss.sp1 #endif /* @@ -539,12 +547,12 @@ static inline void native_swapgs(void) static inline unsigned long current_top_of_stack(void) { -#ifdef CONFIG_X86_64 - return this_cpu_read_stable(cpu_tss.x86_tss.sp0); -#else - /* sp0 on x86_32 is special in and around vm86 mode. */ + /* + * We can't read directly from tss.sp0: sp0 on x86_32 is special in + * and around vm86 mode and sp0 on x86_64 is special because of the + * entry trampoline. + */ return this_cpu_read_stable(cpu_current_top_of_stack); -#endif } static inline bool on_thread_stack(void) diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 70f425947dc5..44a04999791e 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -207,7 +207,7 @@ static inline int arch_within_stack_frames(const void * const stack, #else /* !__ASSEMBLY__ */ #ifdef CONFIG_X86_64 -# define cpu_current_top_of_stack (cpu_tss + TSS_sp0) +# define cpu_current_top_of_stack (cpu_tss + TSS_sp1) #endif #endif diff --git a/arch/x86/kernel/asm-offsets_64.c b/arch/x86/kernel/asm-offsets_64.c index e3a5175a444b..bf51e51d808d 100644 --- a/arch/x86/kernel/asm-offsets_64.c +++ b/arch/x86/kernel/asm-offsets_64.c @@ -66,6 +66,7 @@ int main(void) OFFSET(TSS_ist, tss_struct, x86_tss.ist); OFFSET(TSS_sp0, tss_struct, x86_tss.sp0); + OFFSET(TSS_sp1, tss_struct, x86_tss.sp1); BLANK(); #ifdef CONFIG_CC_STACKPROTECTOR diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 35d674157fda..86e83762e3b3 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -56,6 +56,16 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { * Poison it. */ .sp0 = (1UL << (BITS_PER_LONG-1)) + 1, + +#ifdef CONFIG_X86_64 + /* + * .sp1 is cpu_current_top_of_stack. The init task never + * runs user code, but cpu_current_top_of_stack should still + * be well defined before the first context switch. + */ + .sp1 = TOP_OF_INIT_STACK, +#endif + #ifdef CONFIG_X86_32 .ss0 = __KERNEL_DS, .ss1 = __KERNEL_CS, diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 01b119bebb68..157f81816915 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -461,6 +461,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) * Switch the PDA and FPU contexts. */ this_cpu_write(current_task, next_p); + this_cpu_write(cpu_current_top_of_stack, task_top_of_stack(next_p)); /* Reload sp0. */ update_sp0(next_p); From c3dbef1bd0f7eb09daf49409ea533aa1b0eeb82e Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:22 +0100 Subject: [PATCH 0643/1013] x86/espfix/64: Stop assuming that pt_regs is on the entry stack commit 6d9256f0a89eaff97fca6006100bcaea8d1d8bdb upstream. When we start using an entry trampoline, a #GP from userspace will be delivered on the entry stack, not on the task stack. Fix the espfix64 #DF fixup to set up #GP according to TSS.SP0, rather than assuming that pt_regs + 1 == SP0. This won't change anything without an entry stack, but it will make the code continue to work when an entry stack is added. While we're at it, improve the comments to explain what's actually going on. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.130778051@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/traps.c | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index d3e3bbd5d3a0..f0029d17b14b 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -348,9 +348,15 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) /* * If IRET takes a non-IST fault on the espfix64 stack, then we - * end up promoting it to a doublefault. In that case, modify - * the stack to make it look like we just entered the #GP - * handler from user space, similar to bad_iret. + * end up promoting it to a doublefault. In that case, take + * advantage of the fact that we're not using the normal (TSS.sp0) + * stack right now. We can write a fake #GP(0) frame at TSS.sp0 + * and then modify our own IRET frame so that, when we return, + * we land directly at the #GP(0) vector with the stack already + * set up according to its expectations. + * + * The net result is that our #GP handler will think that we + * entered from usermode with the bad user context. * * No need for ist_enter here because we don't use RCU. */ @@ -358,13 +364,26 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) regs->cs == __KERNEL_CS && regs->ip == (unsigned long)native_irq_return_iret) { - struct pt_regs *normal_regs = task_pt_regs(current); + struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss.x86_tss.sp0) - 1; + + /* + * regs->sp points to the failing IRET frame on the + * ESPFIX64 stack. Copy it to the entry stack. This fills + * in gpregs->ss through gpregs->ip. + * + */ + memmove(&gpregs->ip, (void *)regs->sp, 5*8); + gpregs->orig_ax = 0; /* Missing (lost) #GP error code */ - /* Fake a #GP(0) from userspace. */ - memmove(&normal_regs->ip, (void *)regs->sp, 5*8); - normal_regs->orig_ax = 0; /* Missing (lost) #GP error code */ + /* + * Adjust our frame so that we return straight to the #GP + * vector with the expected RSP value. This is safe because + * we won't enable interupts or schedule before we invoke + * general_protection, so nothing will clobber the stack + * frame we just set up. + */ regs->ip = (unsigned long)general_protection; - regs->sp = (unsigned long)&normal_regs->orig_ax; + regs->sp = (unsigned long)&gpregs->orig_ax; return; } @@ -389,7 +408,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) * * Processors update CR2 whenever a page fault is detected. If a * second page fault occurs while an earlier page fault is being - * deliv- ered, the faulting linear address of the second fault will + * delivered, the faulting linear address of the second fault will * overwrite the contents of CR2 (replacing the previous * address). These updates to CR2 occur even if the page fault * results in a double fault or occurs during the delivery of a From 2bc9fa0beaf10206a778f02e9e5cb62f50345b1a Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:23 +0100 Subject: [PATCH 0644/1013] x86/entry/64: Use a per-CPU trampoline stack for IDT entries commit 7f2590a110b837af5679d08fc25c6227c5a8c497 upstream. Historically, IDT entries from usermode have always gone directly to the running task's kernel stack. Rearrange it so that we enter on a per-CPU trampoline stack and then manually switch to the task's stack. This touches a couple of extra cachelines, but it gives us a chance to run some code before we touch the kernel stack. The asm isn't exactly beautiful, but I think that fully refactoring it can wait. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.225330557@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 67 ++++++++++++++++++++++++-------- arch/x86/entry/entry_64_compat.S | 5 ++- arch/x86/include/asm/switch_to.h | 4 +- arch/x86/include/asm/traps.h | 1 - arch/x86/kernel/cpu/common.c | 6 ++- arch/x86/kernel/traps.c | 21 +++++----- 6 files changed, 72 insertions(+), 32 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 162ce952d716..9b39ce935af7 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -560,6 +560,13 @@ END(irq_entries_start) /* 0(%rsp): ~(interrupt number) */ .macro interrupt func cld + + testb $3, CS-ORIG_RAX(%rsp) + jz 1f + SWAPGS + call switch_to_thread_stack +1: + ALLOC_PT_GPREGS_ON_STACK SAVE_C_REGS SAVE_EXTRA_REGS @@ -569,12 +576,8 @@ END(irq_entries_start) jz 1f /* - * IRQ from user mode. Switch to kernel gsbase and inform context - * tracking that we're in kernel mode. - */ - SWAPGS - - /* + * IRQ from user mode. + * * We need to tell lockdep that IRQs are off. We can't do this until * we fix gsbase, and we should do it before enter_from_user_mode * (which can take locks). Since TRACE_IRQS_OFF idempotent, @@ -828,6 +831,32 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt */ #define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8) +/* + * Switch to the thread stack. This is called with the IRET frame and + * orig_ax on the stack. (That is, RDI..R12 are not on the stack and + * space has not been allocated for them.) + */ +ENTRY(switch_to_thread_stack) + UNWIND_HINT_FUNC + + pushq %rdi + movq %rsp, %rdi + movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp + UNWIND_HINT sp_offset=16 sp_reg=ORC_REG_DI + + pushq 7*8(%rdi) /* regs->ss */ + pushq 6*8(%rdi) /* regs->rsp */ + pushq 5*8(%rdi) /* regs->eflags */ + pushq 4*8(%rdi) /* regs->cs */ + pushq 3*8(%rdi) /* regs->ip */ + pushq 2*8(%rdi) /* regs->orig_ax */ + pushq 8(%rdi) /* return address */ + UNWIND_HINT_FUNC + + movq (%rdi), %rdi + ret +END(switch_to_thread_stack) + .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1 ENTRY(\sym) UNWIND_HINT_IRET_REGS offset=\has_error_code*8 @@ -845,11 +874,12 @@ ENTRY(\sym) ALLOC_PT_GPREGS_ON_STACK - .if \paranoid - .if \paranoid == 1 + .if \paranoid < 2 testb $3, CS(%rsp) /* If coming from userspace, switch stacks */ - jnz 1f + jnz .Lfrom_usermode_switch_stack_\@ .endif + + .if \paranoid call paranoid_entry .else call error_entry @@ -891,20 +921,15 @@ ENTRY(\sym) jmp error_exit .endif - .if \paranoid == 1 + .if \paranoid < 2 /* - * Paranoid entry from userspace. Switch stacks and treat it + * Entry from userspace. Switch stacks and treat it * as a normal entry. This means that paranoid handlers * run in real process context if user_mode(regs). */ -1: +.Lfrom_usermode_switch_stack_\@: call error_entry - - movq %rsp, %rdi /* pt_regs pointer */ - call sync_regs - movq %rax, %rsp /* switch stack */ - movq %rsp, %rdi /* pt_regs pointer */ .if \has_error_code @@ -1165,6 +1190,14 @@ ENTRY(error_entry) SWAPGS .Lerror_entry_from_usermode_after_swapgs: + /* Put us onto the real thread stack. */ + popq %r12 /* save return addr in %12 */ + movq %rsp, %rdi /* arg0 = pt_regs pointer */ + call sync_regs + movq %rax, %rsp /* switch stack */ + ENCODE_FRAME_POINTER + pushq %r12 + /* * We need to tell lockdep that IRQs are off. We can't do this until * we fix gsbase, and we should do it before enter_from_user_mode diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index dcc6987f9bae..95ad40eb7eff 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -306,8 +306,11 @@ ENTRY(entry_INT80_compat) */ movl %eax, %eax - /* Construct struct pt_regs on stack (iret frame is already on stack) */ pushq %rax /* pt_regs->orig_ax */ + + /* switch to thread stack expects orig_ax to be pushed */ + call switch_to_thread_stack + pushq %rdi /* pt_regs->di */ pushq %rsi /* pt_regs->si */ pushq %rdx /* pt_regs->dx */ diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index 8c6bd6863db9..cbc71e73bd32 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -90,10 +90,12 @@ static inline void refresh_sysenter_cs(struct thread_struct *thread) /* This is used when switching tasks or entering/exiting vm86 mode. */ static inline void update_sp0(struct task_struct *task) { + /* On x86_64, sp0 always points to the entry trampoline stack, which is constant: */ #ifdef CONFIG_X86_32 load_sp0(task->thread.sp0); #else - load_sp0(task_top_of_stack(task)); + if (static_cpu_has(X86_FEATURE_XENPV)) + load_sp0(task_top_of_stack(task)); #endif } diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 1fadd310ff68..31051f35cbb7 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -75,7 +75,6 @@ dotraplinkage void do_segment_not_present(struct pt_regs *, long); dotraplinkage void do_stack_segment(struct pt_regs *, long); #ifdef CONFIG_X86_64 dotraplinkage void do_double_fault(struct pt_regs *, long); -asmlinkage struct pt_regs *sync_regs(struct pt_regs *); #endif dotraplinkage void do_general_protection(struct pt_regs *, long); dotraplinkage void do_page_fault(struct pt_regs *, unsigned long); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index e5837bd6c672..57968880e39b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1623,11 +1623,13 @@ void cpu_init(void) setup_cpu_entry_area(cpu); /* - * Initialize the TSS. Don't bother initializing sp0, as the initial - * task never enters user mode. + * Initialize the TSS. sp0 points to the entry trampoline stack + * regardless of what task is running. */ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); load_TR_desc(); + load_sp0((unsigned long)&get_cpu_entry_area(cpu)->tss + + offsetofend(struct tss_struct, SYSENTER_stack)); load_mm_ldt(&init_mm); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index f0029d17b14b..ee9ca0ad4388 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -619,14 +619,15 @@ NOKPROBE_SYMBOL(do_int3); #ifdef CONFIG_X86_64 /* - * Help handler running on IST stack to switch off the IST stack if the - * interrupted code was in user mode. The actual stack switch is done in - * entry_64.S + * Help handler running on a per-cpu (IST or entry trampoline) stack + * to switch to the normal thread stack if the interrupted code was in + * user mode. The actual stack switch is done in entry_64.S */ asmlinkage __visible notrace struct pt_regs *sync_regs(struct pt_regs *eregs) { - struct pt_regs *regs = task_pt_regs(current); - *regs = *eregs; + struct pt_regs *regs = (struct pt_regs *)this_cpu_read(cpu_current_top_of_stack) - 1; + if (regs != eregs) + *regs = *eregs; return regs; } NOKPROBE_SYMBOL(sync_regs); @@ -642,13 +643,13 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s) /* * This is called from entry_64.S early in handling a fault * caused by a bad iret to user mode. To handle the fault - * correctly, we want move our stack frame to task_pt_regs - * and we want to pretend that the exception came from the - * iret target. + * correctly, we want to move our stack frame to where it would + * be had we entered directly on the entry stack (rather than + * just below the IRET frame) and we want to pretend that the + * exception came from the IRET target. */ struct bad_iret_stack *new_stack = - container_of(task_pt_regs(current), - struct bad_iret_stack, regs); + (struct bad_iret_stack *)this_cpu_read(cpu_tss.x86_tss.sp0) - 1; /* Copy the IRET target to the new stack. */ memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8); From 564cea11777e9d73aa04ce5f6fbe9169d7c116a2 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:24 +0100 Subject: [PATCH 0645/1013] x86/entry/64: Return to userspace from the trampoline stack commit 3e3b9293d392c577b62e24e4bc9982320438e749 upstream. By itself, this is useless. It gives us the ability to run some final code before exit that cannnot run on the kernel stack. This could include a CR3 switch a la PAGE_TABLE_ISOLATION or some kernel stack erasing, for example. (Or even weird things like *changing* which kernel stack gets used as an ASLR-strengthening mechanism.) The SYSRET32 path is not covered yet. It could be in the future or we could just ignore it and force the slow path if needed. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.306546484@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 55 ++++++++++++++++++++++++++++++++++++--- 1 file changed, 51 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 9b39ce935af7..9323585c4de6 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -326,8 +326,24 @@ syscall_return_via_sysret: popq %rsi /* skip rcx */ popq %rdx popq %rsi + + /* + * Now all regs are restored except RSP and RDI. + * Save old stack pointer and switch to trampoline stack. + */ + movq %rsp, %rdi + movq PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp + + pushq RSP-RDI(%rdi) /* RSP */ + pushq (%rdi) /* RDI */ + + /* + * We are on the trampoline stack. All regs except RDI are live. + * We can do future final exit work right here. + */ + popq %rdi - movq RSP-ORIG_RAX(%rsp), %rsp + popq %rsp USERGS_SYSRET64 END(entry_SYSCALL_64) @@ -630,10 +646,41 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode) ud2 1: #endif - SWAPGS POP_EXTRA_REGS - POP_C_REGS - addq $8, %rsp /* skip regs->orig_ax */ + popq %r11 + popq %r10 + popq %r9 + popq %r8 + popq %rax + popq %rcx + popq %rdx + popq %rsi + + /* + * The stack is now user RDI, orig_ax, RIP, CS, EFLAGS, RSP, SS. + * Save old stack pointer and switch to trampoline stack. + */ + movq %rsp, %rdi + movq PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp + + /* Copy the IRET frame to the trampoline stack. */ + pushq 6*8(%rdi) /* SS */ + pushq 5*8(%rdi) /* RSP */ + pushq 4*8(%rdi) /* EFLAGS */ + pushq 3*8(%rdi) /* CS */ + pushq 2*8(%rdi) /* RIP */ + + /* Push user RDI on the trampoline stack. */ + pushq (%rdi) + + /* + * We are on the trampoline stack. All regs except RDI are live. + * We can do future final exit work right here. + */ + + /* Restore RDI. */ + popq %rdi + SWAPGS INTERRUPT_RETURN From c631a16e5b84521ae80ab10d47cd6ccd2afc6817 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:25 +0100 Subject: [PATCH 0646/1013] x86/entry/64: Create a per-CPU SYSCALL entry trampoline commit 3386bc8aed825e9f1f65ce38df4b109b2019b71a upstream. Handling SYSCALL is tricky: the SYSCALL handler is entered with every single register (except FLAGS), including RSP, live. It somehow needs to set RSP to point to a valid stack, which means it needs to save the user RSP somewhere and find its own stack pointer. The canonical way to do this is with SWAPGS, which lets us access percpu data using the %gs prefix. With PAGE_TABLE_ISOLATION-like pagetable switching, this is problematic. Without a scratch register, switching CR3 is impossible, so %gs-based percpu memory would need to be mapped in the user pagetables. Doing that without information leaks is difficult or impossible. Instead, use a different sneaky trick. Map a copy of the first part of the SYSCALL asm at a different address for each CPU. Now RIP varies depending on the CPU, so we can use RIP-relative memory access to access percpu memory. By putting the relevant information (one scratch slot and the stack address) at a constant offset relative to RIP, we can make SYSCALL work without relying on %gs. A nice thing about this approach is that we can easily switch it on and off if we want pagetable switching to be configurable. The compat variant of SYSCALL doesn't have this problem in the first place -- there are plenty of scratch registers, since we don't care about preserving r8-r15. This patch therefore doesn't touch SYSCALL32 at all. This patch actually seems to be a small speedup. With this patch, SYSCALL touches an extra cache line and an extra virtual page, but the pipeline no longer stalls waiting for SWAPGS. It seems that, at least in a tight loop, the latter outweights the former. Thanks to David Laight for an optimization tip. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.403607157@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64.S | 58 +++++++++++++++++++++++++++++++++++ arch/x86/include/asm/fixmap.h | 2 ++ arch/x86/kernel/asm-offsets.c | 1 + arch/x86/kernel/cpu/common.c | 15 ++++++++- arch/x86/kernel/vmlinux.lds.S | 9 ++++++ 5 files changed, 84 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 9323585c4de6..47ec428d101a 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -136,6 +136,64 @@ END(native_usergs_sysret64) * with them due to bugs in both AMD and Intel CPUs. */ + .pushsection .entry_trampoline, "ax" + +/* + * The code in here gets remapped into cpu_entry_area's trampoline. This means + * that the assembler and linker have the wrong idea as to where this code + * lives (and, in fact, it's mapped more than once, so it's not even at a + * fixed address). So we can't reference any symbols outside the entry + * trampoline and expect it to work. + * + * Instead, we carefully abuse %rip-relative addressing. + * _entry_trampoline(%rip) refers to the start of the remapped) entry + * trampoline. We can thus find cpu_entry_area with this macro: + */ + +#define CPU_ENTRY_AREA \ + _entry_trampoline - CPU_ENTRY_AREA_entry_trampoline(%rip) + +/* The top word of the SYSENTER stack is hot and is usable as scratch space. */ +#define RSP_SCRATCH CPU_ENTRY_AREA_tss + CPU_TSS_SYSENTER_stack + \ + SIZEOF_SYSENTER_stack - 8 + CPU_ENTRY_AREA + +ENTRY(entry_SYSCALL_64_trampoline) + UNWIND_HINT_EMPTY + swapgs + + /* Stash the user RSP. */ + movq %rsp, RSP_SCRATCH + + /* Load the top of the task stack into RSP */ + movq CPU_ENTRY_AREA_tss + TSS_sp1 + CPU_ENTRY_AREA, %rsp + + /* Start building the simulated IRET frame. */ + pushq $__USER_DS /* pt_regs->ss */ + pushq RSP_SCRATCH /* pt_regs->sp */ + pushq %r11 /* pt_regs->flags */ + pushq $__USER_CS /* pt_regs->cs */ + pushq %rcx /* pt_regs->ip */ + + /* + * x86 lacks a near absolute jump, and we can't jump to the real + * entry text with a relative jump. We could push the target + * address and then use retq, but this destroys the pipeline on + * many CPUs (wasting over 20 cycles on Sandy Bridge). Instead, + * spill RDI and restore it in a second-stage trampoline. + */ + pushq %rdi + movq $entry_SYSCALL_64_stage2, %rdi + jmp *%rdi +END(entry_SYSCALL_64_trampoline) + + .popsection + +ENTRY(entry_SYSCALL_64_stage2) + UNWIND_HINT_EMPTY + popq %rdi + jmp entry_SYSCALL_64_after_hwframe +END(entry_SYSCALL_64_stage2) + ENTRY(entry_SYSCALL_64) UNWIND_HINT_EMPTY /* diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 84558b611ad3..6a699474c2c7 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -61,6 +61,8 @@ struct cpu_entry_area { * of the TSS region. */ struct tss_struct tss; + + char entry_trampoline[PAGE_SIZE]; }; #define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE) diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 55858b277cf6..61b1af88ac07 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -101,4 +101,5 @@ void common(void) { /* Layout info for cpu_entry_area */ OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss); + OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline); } diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 57968880e39b..430f950b0b7f 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -486,6 +486,8 @@ DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); static inline void setup_cpu_entry_area(int cpu) { #ifdef CONFIG_X86_64 + extern char _entry_trampoline[]; + /* On 64-bit systems, we use a read-only fixmap GDT. */ pgprot_t gdt_prot = PAGE_KERNEL_RO; #else @@ -532,6 +534,11 @@ static inline void setup_cpu_entry_area(int cpu) #ifdef CONFIG_X86_32 this_cpu_write(cpu_entry_area, get_cpu_entry_area(cpu)); #endif + +#ifdef CONFIG_X86_64 + __set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline), + __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX); +#endif } /* Load the original GDT from the per-cpu structure */ @@ -1395,10 +1402,16 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks /* May not be marked __init: used by software suspend */ void syscall_init(void) { + extern char _entry_trampoline[]; + extern char entry_SYSCALL_64_trampoline[]; + int cpu = smp_processor_id(); + unsigned long SYSCALL64_entry_trampoline = + (unsigned long)get_cpu_entry_area(cpu)->entry_trampoline + + (entry_SYSCALL_64_trampoline - _entry_trampoline); wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS); - wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64); + wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline); #ifdef CONFIG_IA32_EMULATION wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat); diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index a4009fb9be87..d2a8b5a24a44 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -107,6 +107,15 @@ SECTIONS SOFTIRQENTRY_TEXT *(.fixup) *(.gnu.warning) + +#ifdef CONFIG_X86_64 + . = ALIGN(PAGE_SIZE); + _entry_trampoline = .; + *(.entry_trampoline) + . = ALIGN(PAGE_SIZE); + ASSERT(. - _entry_trampoline == PAGE_SIZE, "entry trampoline is too big"); +#endif + /* End of text section */ _etext = .; } :text = 0x9090 From bb568391775d4a840992e2d2493f39d6e86401e3 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:26 +0100 Subject: [PATCH 0647/1013] x86/entry/64: Move the IST stacks into struct cpu_entry_area commit 40e7f949e0d9a33968ebde5d67f7e3a47c97742a upstream. The IST stacks are needed when an IST exception occurs and are accessed before any kernel code at all runs. Move them into struct cpu_entry_area. The IST stacks are unlike the rest of cpu_entry_area: they're used even for entries from kernel mode. This means that they should be set up before we load the final IDT. Move cpu_entry_area setup to trap_init() for the boot CPU and set it up for all possible CPUs at once in native_smp_prepare_cpus(). Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.480598743@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/fixmap.h | 12 ++++++ arch/x86/kernel/cpu/common.c | 74 ++++++++++++++++++++--------------- arch/x86/kernel/traps.c | 3 ++ 3 files changed, 57 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 6a699474c2c7..451da7d9a502 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -63,10 +63,22 @@ struct cpu_entry_area { struct tss_struct tss; char entry_trampoline[PAGE_SIZE]; + +#ifdef CONFIG_X86_64 + /* + * Exception stacks used for IST entries. + * + * In the future, this should have a separate slot for each stack + * with guard pages between them. + */ + char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]; +#endif }; #define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE) +extern void setup_cpu_entry_areas(void); + /* * Here we define all the compile-time 'special' virtual * addresses. The point is to have a constant address at diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 430f950b0b7f..fb01a8e5e9b7 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -466,24 +466,36 @@ void load_percpu_segment(int cpu) load_stack_canary_segment(); } -static void set_percpu_fixmap_pages(int fixmap_index, void *ptr, - int pages, pgprot_t prot) -{ - int i; - - for (i = 0; i < pages; i++) { - __set_fixmap(fixmap_index - i, - per_cpu_ptr_to_phys(ptr + i * PAGE_SIZE), prot); - } -} - #ifdef CONFIG_X86_32 /* The 32-bit entry code needs to find cpu_entry_area. */ DEFINE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); #endif +#ifdef CONFIG_X86_64 +/* + * Special IST stacks which the CPU switches to when it calls + * an IST-marked descriptor entry. Up to 7 stacks (hardware + * limit), all of them are 4K, except the debug stack which + * is 8K. + */ +static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = { + [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ, + [DEBUG_STACK - 1] = DEBUG_STKSZ +}; + +static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks + [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); +#endif + +static void __init +set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot) +{ + for ( ; pages; pages--, idx--, ptr += PAGE_SIZE) + __set_fixmap(idx, per_cpu_ptr_to_phys(ptr), prot); +} + /* Setup the fixmap mappings only once per-processor */ -static inline void setup_cpu_entry_area(int cpu) +static void __init setup_cpu_entry_area(int cpu) { #ifdef CONFIG_X86_64 extern char _entry_trampoline[]; @@ -532,15 +544,31 @@ static inline void setup_cpu_entry_area(int cpu) PAGE_KERNEL); #ifdef CONFIG_X86_32 - this_cpu_write(cpu_entry_area, get_cpu_entry_area(cpu)); + per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu); #endif #ifdef CONFIG_X86_64 + BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0); + BUILD_BUG_ON(sizeof(exception_stacks) != + sizeof(((struct cpu_entry_area *)0)->exception_stacks)); + set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, exception_stacks), + &per_cpu(exception_stacks, cpu), + sizeof(exception_stacks) / PAGE_SIZE, + PAGE_KERNEL); + __set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline), __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX); #endif } +void __init setup_cpu_entry_areas(void) +{ + unsigned int cpu; + + for_each_possible_cpu(cpu) + setup_cpu_entry_area(cpu); +} + /* Load the original GDT from the per-cpu structure */ void load_direct_gdt(int cpu) { @@ -1385,20 +1413,6 @@ DEFINE_PER_CPU(unsigned int, irq_count) __visible = -1; DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; EXPORT_PER_CPU_SYMBOL(__preempt_count); -/* - * Special IST stacks which the CPU switches to when it calls - * an IST-marked descriptor entry. Up to 7 stacks (hardware - * limit), all of them are 4K, except the debug stack which - * is 8K. - */ -static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = { - [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ, - [DEBUG_STACK - 1] = DEBUG_STKSZ -}; - -static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks - [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); - /* May not be marked __init: used by software suspend */ void syscall_init(void) { @@ -1607,7 +1621,7 @@ void cpu_init(void) * set up and load the per-CPU TSS */ if (!oist->ist[0]) { - char *estacks = per_cpu(exception_stacks, cpu); + char *estacks = get_cpu_entry_area(cpu)->exception_stacks; for (v = 0; v < N_EXCEPTION_STACKS; v++) { estacks += exception_stack_sizes[v]; @@ -1633,8 +1647,6 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, me); - setup_cpu_entry_area(cpu); - /* * Initialize the TSS. sp0 points to the entry trampoline stack * regardless of what task is running. @@ -1694,8 +1706,6 @@ void cpu_init(void) initialize_tlbstate_and_flush(); enter_lazy_tlb(&init_mm, curr); - setup_cpu_entry_area(cpu); - /* * Initialize the TSS. Don't bother initializing sp0, as the initial * task never enters user mode. diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index ee9ca0ad4388..3e29aad5c7cc 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -947,6 +947,9 @@ dotraplinkage void do_iret_error(struct pt_regs *regs, long error_code) void __init trap_init(void) { + /* Init cpu_entry_area before IST entries are set up */ + setup_cpu_entry_areas(); + idt_setup_traps(); /* From b25ca49efac58fe9744b9a3f725c4e4d69b276f8 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:27 +0100 Subject: [PATCH 0648/1013] x86/entry/64: Remove the SYSENTER stack canary commit 7fbbd5cbebf118a9e09f5453f686656a167c3d1c upstream. Now that the SYSENTER stack has a guard page, there's no need for a canary to detect overflow after the fact. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.572577316@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/processor.h | 1 - arch/x86/kernel/dumpstack.c | 3 +-- arch/x86/kernel/process.c | 1 - arch/x86/kernel/traps.c | 7 ------- 4 files changed, 1 insertion(+), 11 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index b0cf0612a454..d34ac13c5866 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -341,7 +341,6 @@ struct tss_struct { * Space for the temporary SYSENTER stack, used for SYSENTER * and the entry trampoline as well. */ - unsigned long SYSENTER_stack_canary; unsigned long SYSENTER_stack[64]; /* diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 60267850125e..ae1ce2e3f132 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -48,8 +48,7 @@ bool in_sysenter_stack(unsigned long *stack, struct stack_info *info) int cpu = smp_processor_id(); struct tss_struct *tss = &get_cpu_entry_area(cpu)->tss; - /* Treat the canary as part of the stack for unwinding purposes. */ - void *begin = &tss->SYSENTER_stack_canary; + void *begin = &tss->SYSENTER_stack; void *end = (void *)&tss->SYSENTER_stack + sizeof(tss->SYSENTER_stack); if ((void *)stack < begin || (void *)stack >= end) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 86e83762e3b3..6a04287f222b 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -81,7 +81,6 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { */ .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 }, #endif - .SYSENTER_stack_canary = STACK_END_MAGIC, }; EXPORT_PER_CPU_SYMBOL(cpu_tss); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 3e29aad5c7cc..5ade4f89a6d1 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -814,13 +814,6 @@ dotraplinkage void do_debug(struct pt_regs *regs, long error_code) debug_stack_usage_dec(); exit: - /* - * This is the most likely code path that involves non-trivial use - * of the SYSENTER stack. Check that we haven't overrun it. - */ - WARN(this_cpu_read(cpu_tss.SYSENTER_stack_canary) != STACK_END_MAGIC, - "Overran or corrupted SYSENTER stack\n"); - ist_exit(regs); } NOKPROBE_SYMBOL(do_debug); From e313437c85da22a26fe1ba01d94abec7698b1987 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:28 +0100 Subject: [PATCH 0649/1013] x86/entry: Clean up the SYSENTER_stack code commit 0f9a48100fba3f189724ae88a450c2261bf91c80 upstream. The existing code was a mess, mainly because C arrays are nasty. Turn SYSENTER_stack into a struct, add a helper to find it, and do all the obvious cleanups this enables. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.653244723@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_32.S | 4 ++-- arch/x86/entry/entry_64.S | 2 +- arch/x86/include/asm/fixmap.h | 5 +++++ arch/x86/include/asm/processor.h | 6 +++++- arch/x86/kernel/asm-offsets.c | 6 ++---- arch/x86/kernel/cpu/common.c | 14 +++----------- arch/x86/kernel/dumpstack.c | 7 +++---- 7 files changed, 21 insertions(+), 23 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 0ab316c46806..3629bcbf85a2 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -942,7 +942,7 @@ ENTRY(debug) /* Are we currently on the SYSENTER stack? */ movl PER_CPU_VAR(cpu_entry_area), %ecx - addl $CPU_ENTRY_AREA_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx + addl $CPU_ENTRY_AREA_tss + TSS_STRUCT_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ cmpl $SIZEOF_SYSENTER_stack, %ecx jb .Ldebug_from_sysenter_stack @@ -986,7 +986,7 @@ ENTRY(nmi) /* Are we currently on the SYSENTER stack? */ movl PER_CPU_VAR(cpu_entry_area), %ecx - addl $CPU_ENTRY_AREA_tss + CPU_TSS_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx + addl $CPU_ENTRY_AREA_tss + TSS_STRUCT_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ cmpl $SIZEOF_SYSENTER_stack, %ecx jb .Lnmi_from_sysenter_stack diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 47ec428d101a..ba0933532f2f 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -154,7 +154,7 @@ END(native_usergs_sysret64) _entry_trampoline - CPU_ENTRY_AREA_entry_trampoline(%rip) /* The top word of the SYSENTER stack is hot and is usable as scratch space. */ -#define RSP_SCRATCH CPU_ENTRY_AREA_tss + CPU_TSS_SYSENTER_stack + \ +#define RSP_SCRATCH CPU_ENTRY_AREA_tss + TSS_STRUCT_SYSENTER_stack + \ SIZEOF_SYSENTER_stack - 8 + CPU_ENTRY_AREA ENTRY(entry_SYSCALL_64_trampoline) diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 451da7d9a502..cc5d98bdca37 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -245,5 +245,10 @@ static inline struct cpu_entry_area *get_cpu_entry_area(int cpu) return (struct cpu_entry_area *)__fix_to_virt(__get_cpu_entry_area_page_index(cpu, 0)); } +static inline struct SYSENTER_stack *cpu_SYSENTER_stack(int cpu) +{ + return &get_cpu_entry_area(cpu)->tss.SYSENTER_stack; +} + #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_FIXMAP_H */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index d34ac13c5866..f933869470b8 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -336,12 +336,16 @@ struct x86_hw_tss { #define IO_BITMAP_OFFSET (offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss)) #define INVALID_IO_BITMAP_OFFSET 0x8000 +struct SYSENTER_stack { + unsigned long words[64]; +}; + struct tss_struct { /* * Space for the temporary SYSENTER stack, used for SYSENTER * and the entry trampoline as well. */ - unsigned long SYSENTER_stack[64]; + struct SYSENTER_stack SYSENTER_stack; /* * The fixed hardware portion. This must not cross a page boundary diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 61b1af88ac07..46c0995344aa 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -94,10 +94,8 @@ void common(void) { BLANK(); DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); - /* Offset from cpu_tss to SYSENTER_stack */ - OFFSET(CPU_TSS_SYSENTER_stack, tss_struct, SYSENTER_stack); - /* Size of SYSENTER_stack */ - DEFINE(SIZEOF_SYSENTER_stack, sizeof(((struct tss_struct *)0)->SYSENTER_stack)); + OFFSET(TSS_STRUCT_SYSENTER_stack, tss_struct, SYSENTER_stack); + DEFINE(SIZEOF_SYSENTER_stack, sizeof(struct SYSENTER_stack)); /* Layout info for cpu_entry_area */ OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index fb01a8e5e9b7..3de7480e4f32 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1314,12 +1314,7 @@ void enable_sep_cpu(void) tss->x86_tss.ss1 = __KERNEL_CS; wrmsr(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1, 0); - - wrmsr(MSR_IA32_SYSENTER_ESP, - (unsigned long)&get_cpu_entry_area(cpu)->tss + - offsetofend(struct tss_struct, SYSENTER_stack), - 0); - + wrmsr(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_SYSENTER_stack(cpu) + 1), 0); wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0); put_cpu(); @@ -1436,9 +1431,7 @@ void syscall_init(void) * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit). */ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS); - wrmsrl_safe(MSR_IA32_SYSENTER_ESP, - (unsigned long)&get_cpu_entry_area(cpu)->tss + - offsetofend(struct tss_struct, SYSENTER_stack)); + wrmsrl_safe(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_SYSENTER_stack(cpu) + 1)); wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat); #else wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret); @@ -1653,8 +1646,7 @@ void cpu_init(void) */ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); load_TR_desc(); - load_sp0((unsigned long)&get_cpu_entry_area(cpu)->tss + - offsetofend(struct tss_struct, SYSENTER_stack)); + load_sp0((unsigned long)(cpu_SYSENTER_stack(cpu) + 1)); load_mm_ldt(&init_mm); diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index ae1ce2e3f132..bbd6d986e2d0 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -45,11 +45,10 @@ bool in_task_stack(unsigned long *stack, struct task_struct *task, bool in_sysenter_stack(unsigned long *stack, struct stack_info *info) { - int cpu = smp_processor_id(); - struct tss_struct *tss = &get_cpu_entry_area(cpu)->tss; + struct SYSENTER_stack *ss = cpu_SYSENTER_stack(smp_processor_id()); - void *begin = &tss->SYSENTER_stack; - void *end = (void *)&tss->SYSENTER_stack + sizeof(tss->SYSENTER_stack); + void *begin = ss; + void *end = ss + 1; if ((void *)stack < begin || (void *)stack >= end) return false; From 25e2999e630c744c27c9179d1279c7f9d503385e Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:29 +0100 Subject: [PATCH 0650/1013] x86/entry/64: Make cpu_entry_area.tss read-only commit c482feefe1aeb150156248ba0fd3e029bc886605 upstream. The TSS is a fairly juicy target for exploits, and, now that the TSS is in the cpu_entry_area, it's no longer protected by kASLR. Make it read-only on x86_64. On x86_32, it can't be RO because it's written by the CPU during task switches, and we use a task gate for double faults. I'd also be nervous about errata if we tried to make it RO even on configurations without double fault handling. [ tglx: AMD confirmed that there is no problem on 64-bit with TSS RO. So it's probably safe to assume that it's a non issue, though Intel might have been creative in that area. Still waiting for confirmation. ] Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.733700132@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_32.S | 4 ++-- arch/x86/entry/entry_64.S | 8 ++++---- arch/x86/include/asm/fixmap.h | 13 +++++++++---- arch/x86/include/asm/processor.h | 17 ++++++++--------- arch/x86/include/asm/switch_to.h | 4 ++-- arch/x86/include/asm/thread_info.h | 2 +- arch/x86/kernel/asm-offsets.c | 5 ++--- arch/x86/kernel/asm-offsets_32.c | 4 ++-- arch/x86/kernel/cpu/common.c | 29 +++++++++++++++++++---------- arch/x86/kernel/ioport.c | 2 +- arch/x86/kernel/process.c | 6 +++--- arch/x86/kernel/process_32.c | 2 +- arch/x86/kernel/process_64.c | 2 +- arch/x86/kernel/traps.c | 4 ++-- arch/x86/lib/delay.c | 4 ++-- arch/x86/xen/enlighten_pv.c | 2 +- 16 files changed, 60 insertions(+), 48 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 3629bcbf85a2..bd8b57a5c874 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -942,7 +942,7 @@ ENTRY(debug) /* Are we currently on the SYSENTER stack? */ movl PER_CPU_VAR(cpu_entry_area), %ecx - addl $CPU_ENTRY_AREA_tss + TSS_STRUCT_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx + addl $CPU_ENTRY_AREA_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ cmpl $SIZEOF_SYSENTER_stack, %ecx jb .Ldebug_from_sysenter_stack @@ -986,7 +986,7 @@ ENTRY(nmi) /* Are we currently on the SYSENTER stack? */ movl PER_CPU_VAR(cpu_entry_area), %ecx - addl $CPU_ENTRY_AREA_tss + TSS_STRUCT_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx + addl $CPU_ENTRY_AREA_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ cmpl $SIZEOF_SYSENTER_stack, %ecx jb .Lnmi_from_sysenter_stack diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index ba0933532f2f..6abe3fcaece9 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -154,7 +154,7 @@ END(native_usergs_sysret64) _entry_trampoline - CPU_ENTRY_AREA_entry_trampoline(%rip) /* The top word of the SYSENTER stack is hot and is usable as scratch space. */ -#define RSP_SCRATCH CPU_ENTRY_AREA_tss + TSS_STRUCT_SYSENTER_stack + \ +#define RSP_SCRATCH CPU_ENTRY_AREA_SYSENTER_stack + \ SIZEOF_SYSENTER_stack - 8 + CPU_ENTRY_AREA ENTRY(entry_SYSCALL_64_trampoline) @@ -390,7 +390,7 @@ syscall_return_via_sysret: * Save old stack pointer and switch to trampoline stack. */ movq %rsp, %rdi - movq PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp + movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp pushq RSP-RDI(%rdi) /* RSP */ pushq (%rdi) /* RDI */ @@ -719,7 +719,7 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode) * Save old stack pointer and switch to trampoline stack. */ movq %rsp, %rdi - movq PER_CPU_VAR(cpu_tss + TSS_sp0), %rsp + movq PER_CPU_VAR(cpu_tss_rw + TSS_sp0), %rsp /* Copy the IRET frame to the trampoline stack. */ pushq 6*8(%rdi) /* SS */ @@ -934,7 +934,7 @@ apicinterrupt IRQ_WORK_VECTOR irq_work_interrupt smp_irq_work_interrupt /* * Exception entry points. */ -#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8) +#define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss_rw) + (TSS_ist + ((x) - 1) * 8) /* * Switch to the thread stack. This is called with the IRET frame and diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index cc5d98bdca37..94fc4fa14127 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -56,9 +56,14 @@ struct cpu_entry_area { char gdt[PAGE_SIZE]; /* - * The GDT is just below cpu_tss and thus serves (on x86_64) as a - * a read-only guard page for the SYSENTER stack at the bottom - * of the TSS region. + * The GDT is just below SYSENTER_stack and thus serves (on x86_64) as + * a a read-only guard page. + */ + struct SYSENTER_stack_page SYSENTER_stack_page; + + /* + * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because + * we need task switches to work, and task switches write to the TSS. */ struct tss_struct tss; @@ -247,7 +252,7 @@ static inline struct cpu_entry_area *get_cpu_entry_area(int cpu) static inline struct SYSENTER_stack *cpu_SYSENTER_stack(int cpu) { - return &get_cpu_entry_area(cpu)->tss.SYSENTER_stack; + return &get_cpu_entry_area(cpu)->SYSENTER_stack_page.stack; } #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index f933869470b8..e8991d7f7034 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -340,13 +340,11 @@ struct SYSENTER_stack { unsigned long words[64]; }; -struct tss_struct { - /* - * Space for the temporary SYSENTER stack, used for SYSENTER - * and the entry trampoline as well. - */ - struct SYSENTER_stack SYSENTER_stack; +struct SYSENTER_stack_page { + struct SYSENTER_stack stack; +} __aligned(PAGE_SIZE); +struct tss_struct { /* * The fixed hardware portion. This must not cross a page boundary * at risk of violating the SDM's advice and potentially triggering @@ -363,7 +361,7 @@ struct tss_struct { unsigned long io_bitmap[IO_BITMAP_LONGS + 1]; } __aligned(PAGE_SIZE); -DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss); +DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); /* * sizeof(unsigned long) coming from an extra "long" at the end @@ -378,7 +376,8 @@ DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss); #ifdef CONFIG_X86_32 DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack); #else -#define cpu_current_top_of_stack cpu_tss.x86_tss.sp1 +/* The RO copy can't be accessed with this_cpu_xyz(), so use the RW copy. */ +#define cpu_current_top_of_stack cpu_tss_rw.x86_tss.sp1 #endif /* @@ -538,7 +537,7 @@ static inline void native_set_iopl_mask(unsigned mask) static inline void native_load_sp0(unsigned long sp0) { - this_cpu_write(cpu_tss.x86_tss.sp0, sp0); + this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0); } static inline void native_swapgs(void) diff --git a/arch/x86/include/asm/switch_to.h b/arch/x86/include/asm/switch_to.h index cbc71e73bd32..9b6df68d8fd1 100644 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -79,10 +79,10 @@ do { \ static inline void refresh_sysenter_cs(struct thread_struct *thread) { /* Only happens when SEP is enabled, no need to test "SEP"arately: */ - if (unlikely(this_cpu_read(cpu_tss.x86_tss.ss1) == thread->sysenter_cs)) + if (unlikely(this_cpu_read(cpu_tss_rw.x86_tss.ss1) == thread->sysenter_cs)) return; - this_cpu_write(cpu_tss.x86_tss.ss1, thread->sysenter_cs); + this_cpu_write(cpu_tss_rw.x86_tss.ss1, thread->sysenter_cs); wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0); } #endif diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index 44a04999791e..00223333821a 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -207,7 +207,7 @@ static inline int arch_within_stack_frames(const void * const stack, #else /* !__ASSEMBLY__ */ #ifdef CONFIG_X86_64 -# define cpu_current_top_of_stack (cpu_tss + TSS_sp1) +# define cpu_current_top_of_stack (cpu_tss_rw + TSS_sp1) #endif #endif diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 46c0995344aa..cd360a5e0dca 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -94,10 +94,9 @@ void common(void) { BLANK(); DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); - OFFSET(TSS_STRUCT_SYSENTER_stack, tss_struct, SYSENTER_stack); - DEFINE(SIZEOF_SYSENTER_stack, sizeof(struct SYSENTER_stack)); - /* Layout info for cpu_entry_area */ OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss); OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline); + OFFSET(CPU_ENTRY_AREA_SYSENTER_stack, cpu_entry_area, SYSENTER_stack_page); + DEFINE(SIZEOF_SYSENTER_stack, sizeof(struct SYSENTER_stack)); } diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index 52ce4ea16e53..7d20d9c0b3d6 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -47,8 +47,8 @@ void foo(void) BLANK(); /* Offset from the sysenter stack to tss.sp0 */ - DEFINE(TSS_sysenter_sp0, offsetof(struct tss_struct, x86_tss.sp0) - - offsetofend(struct tss_struct, SYSENTER_stack)); + DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - + offsetofend(struct cpu_entry_area, SYSENTER_stack_page.stack)); #ifdef CONFIG_CC_STACKPROTECTOR BLANK(); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 3de7480e4f32..c2eada1056de 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -487,6 +487,9 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); #endif +static DEFINE_PER_CPU_PAGE_ALIGNED(struct SYSENTER_stack_page, + SYSENTER_stack_storage); + static void __init set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot) { @@ -500,23 +503,29 @@ static void __init setup_cpu_entry_area(int cpu) #ifdef CONFIG_X86_64 extern char _entry_trampoline[]; - /* On 64-bit systems, we use a read-only fixmap GDT. */ + /* On 64-bit systems, we use a read-only fixmap GDT and TSS. */ pgprot_t gdt_prot = PAGE_KERNEL_RO; + pgprot_t tss_prot = PAGE_KERNEL_RO; #else /* * On native 32-bit systems, the GDT cannot be read-only because * our double fault handler uses a task gate, and entering through - * a task gate needs to change an available TSS to busy. If the GDT - * is read-only, that will triple fault. + * a task gate needs to change an available TSS to busy. If the + * GDT is read-only, that will triple fault. The TSS cannot be + * read-only because the CPU writes to it on task switches. * - * On Xen PV, the GDT must be read-only because the hypervisor requires - * it. + * On Xen PV, the GDT must be read-only because the hypervisor + * requires it. */ pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ? PAGE_KERNEL_RO : PAGE_KERNEL; + pgprot_t tss_prot = PAGE_KERNEL; #endif __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot); + set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, SYSENTER_stack_page), + per_cpu_ptr(&SYSENTER_stack_storage, cpu), 1, + PAGE_KERNEL); /* * The Intel SDM says (Volume 3, 7.2.1): @@ -539,9 +548,9 @@ static void __init setup_cpu_entry_area(int cpu) offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK); BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0); set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss), - &per_cpu(cpu_tss, cpu), + &per_cpu(cpu_tss_rw, cpu), sizeof(struct tss_struct) / PAGE_SIZE, - PAGE_KERNEL); + tss_prot); #ifdef CONFIG_X86_32 per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu); @@ -1305,7 +1314,7 @@ void enable_sep_cpu(void) return; cpu = get_cpu(); - tss = &per_cpu(cpu_tss, cpu); + tss = &per_cpu(cpu_tss_rw, cpu); /* * We cache MSR_IA32_SYSENTER_CS's value in the TSS's ss1 field -- @@ -1575,7 +1584,7 @@ void cpu_init(void) if (cpu) load_ucode_ap(); - t = &per_cpu(cpu_tss, cpu); + t = &per_cpu(cpu_tss_rw, cpu); oist = &per_cpu(orig_ist, cpu); #ifdef CONFIG_NUMA @@ -1667,7 +1676,7 @@ void cpu_init(void) { int cpu = smp_processor_id(); struct task_struct *curr = current; - struct tss_struct *t = &per_cpu(cpu_tss, cpu); + struct tss_struct *t = &per_cpu(cpu_tss_rw, cpu); wait_for_master_cpu(cpu); diff --git a/arch/x86/kernel/ioport.c b/arch/x86/kernel/ioport.c index 3feb648781c4..2f723301eb58 100644 --- a/arch/x86/kernel/ioport.c +++ b/arch/x86/kernel/ioport.c @@ -67,7 +67,7 @@ asmlinkage long sys_ioperm(unsigned long from, unsigned long num, int turn_on) * because the ->io_bitmap_max value must match the bitmap * contents: */ - tss = &per_cpu(cpu_tss, get_cpu()); + tss = &per_cpu(cpu_tss_rw, get_cpu()); if (turn_on) bitmap_clear(t->io_bitmap_ptr, from, num); diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 6a04287f222b..517415978409 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -47,7 +47,7 @@ * section. Since TSS's are completely CPU-local, we want them * on exact cacheline boundaries, to eliminate cacheline ping-pong. */ -__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { +__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss_rw) = { .x86_tss = { /* * .sp0 is only used when entering ring 0 from a lower @@ -82,7 +82,7 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss) = { .io_bitmap = { [0 ... IO_BITMAP_LONGS] = ~0 }, #endif }; -EXPORT_PER_CPU_SYMBOL(cpu_tss); +EXPORT_PER_CPU_SYMBOL(cpu_tss_rw); DEFINE_PER_CPU(bool, __tss_limit_invalid); EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid); @@ -111,7 +111,7 @@ void exit_thread(struct task_struct *tsk) struct fpu *fpu = &t->fpu; if (bp) { - struct tss_struct *tss = &per_cpu(cpu_tss, get_cpu()); + struct tss_struct *tss = &per_cpu(cpu_tss_rw, get_cpu()); t->io_bitmap_ptr = NULL; clear_thread_flag(TIF_IO_BITMAP); diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c index 45bf0c5f93e1..5224c6099184 100644 --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -234,7 +234,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) struct fpu *prev_fpu = &prev->fpu; struct fpu *next_fpu = &next->fpu; int cpu = smp_processor_id(); - struct tss_struct *tss = &per_cpu(cpu_tss, cpu); + struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu); /* never put a printk in __switch_to... printk() calls wake_up*() indirectly */ diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c index 157f81816915..c75466232016 100644 --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -399,7 +399,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p) struct fpu *prev_fpu = &prev->fpu; struct fpu *next_fpu = &next->fpu; int cpu = smp_processor_id(); - struct tss_struct *tss = &per_cpu(cpu_tss, cpu); + struct tss_struct *tss = &per_cpu(cpu_tss_rw, cpu); WARN_ON_ONCE(IS_ENABLED(CONFIG_DEBUG_ENTRY) && this_cpu_read(irq_count) != -1); diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 5ade4f89a6d1..74136fd16f49 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -364,7 +364,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) regs->cs == __KERNEL_CS && regs->ip == (unsigned long)native_irq_return_iret) { - struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss.x86_tss.sp0) - 1; + struct pt_regs *gpregs = (struct pt_regs *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1; /* * regs->sp points to the failing IRET frame on the @@ -649,7 +649,7 @@ struct bad_iret_stack *fixup_bad_iret(struct bad_iret_stack *s) * exception came from the IRET target. */ struct bad_iret_stack *new_stack = - (struct bad_iret_stack *)this_cpu_read(cpu_tss.x86_tss.sp0) - 1; + (struct bad_iret_stack *)this_cpu_read(cpu_tss_rw.x86_tss.sp0) - 1; /* Copy the IRET target to the new stack. */ memmove(&new_stack->regs.ip, (void *)s->regs.sp, 5*8); diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c index 553f8fd23cc4..4846eff7e4c8 100644 --- a/arch/x86/lib/delay.c +++ b/arch/x86/lib/delay.c @@ -107,10 +107,10 @@ static void delay_mwaitx(unsigned long __loops) delay = min_t(u64, MWAITX_MAX_LOOPS, loops); /* - * Use cpu_tss as a cacheline-aligned, seldomly + * Use cpu_tss_rw as a cacheline-aligned, seldomly * accessed per-cpu variable as the monitor target. */ - __monitorx(raw_cpu_ptr(&cpu_tss), 0, 0); + __monitorx(raw_cpu_ptr(&cpu_tss_rw), 0, 0); /* * AMD, like Intel, supports the EAX hint and EAX=0xf diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index fbd054d6ac97..ae3a071e1d0f 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -818,7 +818,7 @@ static void xen_load_sp0(unsigned long sp0) mcs = xen_mc_entry(0); MULTI_stack_switch(mcs.mc, __KERNEL_DS, sp0); xen_mc_issue(PARAVIRT_LAZY_CPU); - this_cpu_write(cpu_tss.x86_tss.sp0, sp0); + this_cpu_write(cpu_tss_rw.x86_tss.sp0, sp0); } void xen_set_iopl_mask(unsigned mask) From 96e63420e281205e98cca0b6dbb5a79af98a5efd Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:30 +0100 Subject: [PATCH 0651/1013] x86/paravirt: Dont patch flush_tlb_single commit a035795499ca1c2bd1928808d1a156eda1420383 upstream. native_flush_tlb_single() will be changed with the upcoming PAGE_TABLE_ISOLATION feature. This requires to have more code in there than INVLPG. Remove the paravirt patching for it. Signed-off-by: Thomas Gleixner Reviewed-by: Josh Poimboeuf Reviewed-by: Juergen Gross Acked-by: Peter Zijlstra Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Cc: michael.schwarz@iaik.tugraz.at Cc: moritz.lipp@iaik.tugraz.at Cc: richard.fellner@student.tugraz.at Link: https://lkml.kernel.org/r/20171204150606.828111617@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/paravirt_patch_64.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/kernel/paravirt_patch_64.c b/arch/x86/kernel/paravirt_patch_64.c index ac0be8283325..9edadabf04f6 100644 --- a/arch/x86/kernel/paravirt_patch_64.c +++ b/arch/x86/kernel/paravirt_patch_64.c @@ -10,7 +10,6 @@ DEF_NATIVE(pv_irq_ops, save_fl, "pushfq; popq %rax"); DEF_NATIVE(pv_mmu_ops, read_cr2, "movq %cr2, %rax"); DEF_NATIVE(pv_mmu_ops, read_cr3, "movq %cr3, %rax"); DEF_NATIVE(pv_mmu_ops, write_cr3, "movq %rdi, %cr3"); -DEF_NATIVE(pv_mmu_ops, flush_tlb_single, "invlpg (%rdi)"); DEF_NATIVE(pv_cpu_ops, wbinvd, "wbinvd"); DEF_NATIVE(pv_cpu_ops, usergs_sysret64, "swapgs; sysretq"); @@ -60,7 +59,6 @@ unsigned native_patch(u8 type, u16 clobbers, void *ibuf, PATCH_SITE(pv_mmu_ops, read_cr2); PATCH_SITE(pv_mmu_ops, read_cr3); PATCH_SITE(pv_mmu_ops, write_cr3); - PATCH_SITE(pv_mmu_ops, flush_tlb_single); PATCH_SITE(pv_cpu_ops, wbinvd); #if defined(CONFIG_PARAVIRT_SPINLOCKS) case PARAVIRT_PATCH(pv_lock_ops.queued_spin_unlock): From 1eb2e614fd17ae8de373aaf9919b803b0a6cfc27 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:31 +0100 Subject: [PATCH 0652/1013] x86/paravirt: Provide a way to check for hypervisors commit 79cc74155218316b9a5d28577c7077b2adba8e58 upstream. There is no generic way to test whether a kernel is running on a specific hypervisor. But that's required to prevent the upcoming user address space separation feature in certain guest modes. Make the hypervisor type enum unconditionally available and provide a helper function which allows to test for a specific type. Signed-off-by: Thomas Gleixner Reviewed-by: Juergen Gross Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.912938129@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/hypervisor.h | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/hypervisor.h b/arch/x86/include/asm/hypervisor.h index 1b0a5abcd8ae..96aa6b9884dc 100644 --- a/arch/x86/include/asm/hypervisor.h +++ b/arch/x86/include/asm/hypervisor.h @@ -20,16 +20,7 @@ #ifndef _ASM_X86_HYPERVISOR_H #define _ASM_X86_HYPERVISOR_H -#ifdef CONFIG_HYPERVISOR_GUEST - -#include -#include -#include - -/* - * x86 hypervisor information - */ - +/* x86 hypervisor types */ enum x86_hypervisor_type { X86_HYPER_NATIVE = 0, X86_HYPER_VMWARE, @@ -39,6 +30,12 @@ enum x86_hypervisor_type { X86_HYPER_KVM, }; +#ifdef CONFIG_HYPERVISOR_GUEST + +#include +#include +#include + struct hypervisor_x86 { /* Hypervisor name */ const char *name; @@ -58,7 +55,15 @@ struct hypervisor_x86 { extern enum x86_hypervisor_type x86_hyper_type; extern void init_hypervisor_platform(void); +static inline bool hypervisor_is_type(enum x86_hypervisor_type type) +{ + return x86_hyper_type == type; +} #else static inline void init_hypervisor_platform(void) { } +static inline bool hypervisor_is_type(enum x86_hypervisor_type type) +{ + return type == X86_HYPER_NATIVE; +} #endif /* CONFIG_HYPERVISOR_GUEST */ #endif /* _ASM_X86_HYPERVISOR_H */ From 8388d287e361a2fd0a39bece30a736d692d5c3d8 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:32 +0100 Subject: [PATCH 0653/1013] x86/cpufeatures: Make CPU bugs sticky commit 6cbd2171e89b13377261d15e64384df60ecb530e upstream. There is currently no way to force CPU bug bits like CPU feature bits. That makes it impossible to set a bug bit once at boot and have it stick for all upcoming CPUs. Extend the force set/clear arrays to handle bug bits as well. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Rik van Riel Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171204150606.992156574@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeature.h | 2 ++ arch/x86/include/asm/processor.h | 4 ++-- arch/x86/kernel/cpu/common.c | 6 +++--- 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index bf6a76202a77..ea9a7dde62e5 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -135,6 +135,8 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit); set_bit(bit, (unsigned long *)cpu_caps_set); \ } while (0) +#define setup_force_cpu_bug(bit) setup_force_cpu_cap(bit) + #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_X86_FAST_FEATURE_TESTS) /* * Static testing of CPU features. Used the same as boot_cpu_has(). diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index e8991d7f7034..da943411d3d8 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -163,8 +163,8 @@ extern struct cpuinfo_x86 boot_cpu_data; extern struct cpuinfo_x86 new_cpu_data; extern struct x86_hw_tss doublefault_tss; -extern __u32 cpu_caps_cleared[NCAPINTS]; -extern __u32 cpu_caps_set[NCAPINTS]; +extern __u32 cpu_caps_cleared[NCAPINTS + NBUGINTS]; +extern __u32 cpu_caps_set[NCAPINTS + NBUGINTS]; #ifdef CONFIG_SMP DECLARE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index c2eada1056de..034900623adf 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -452,8 +452,8 @@ static const char *table_lookup_model(struct cpuinfo_x86 *c) return NULL; /* Not found */ } -__u32 cpu_caps_cleared[NCAPINTS]; -__u32 cpu_caps_set[NCAPINTS]; +__u32 cpu_caps_cleared[NCAPINTS + NBUGINTS]; +__u32 cpu_caps_set[NCAPINTS + NBUGINTS]; void load_percpu_segment(int cpu) { @@ -812,7 +812,7 @@ static void apply_forced_caps(struct cpuinfo_x86 *c) { int i; - for (i = 0; i < NCAPINTS; i++) { + for (i = 0; i < NCAPINTS + NBUGINTS; i++) { c->x86_capability[i] &= ~cpu_caps_cleared[i]; c->x86_capability[i] |= cpu_caps_set[i]; } From 23b22186b27f6f16cddc69fd12b607c71de6451f Mon Sep 17 00:00:00 2001 From: Jens Wiklander Date: Mon, 9 Oct 2017 11:11:49 +0200 Subject: [PATCH 0654/1013] optee: fix invalid of_node_put() in optee_driver_init() commit f044113113dd95ba73916bde10e804d3cdfa2662 upstream. The first node supplied to of_find_matching_node() has its reference counter decreased as part of call to that function. In optee_driver_init() after calling of_find_matching_node() it's invalid to call of_node_put() on the supplied node again. So remove the invalid call to of_node_put(). Reported-by: Alex Shi Signed-off-by: Jens Wiklander Cc: Signed-off-by: Greg Kroah-Hartman --- drivers/tee/optee/core.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c index 7952357df9c8..edb6e4e9ef3a 100644 --- a/drivers/tee/optee/core.c +++ b/drivers/tee/optee/core.c @@ -590,7 +590,6 @@ static int __init optee_driver_init(void) return -ENODEV; np = of_find_matching_node(fw_np, optee_match); - of_node_put(fw_np); if (!np) return -ENODEV; From b04c22da18b53c7be730478282f7e6f62cef9425 Mon Sep 17 00:00:00 2001 From: Derek Basehore Date: Tue, 29 Aug 2017 13:34:34 -0700 Subject: [PATCH 0655/1013] backlight: pwm_bl: Fix overflow condition [ Upstream commit 5d0c49acebc9488e37db95f1d4a55644e545ffe7 ] This fixes an overflow condition that can happen with high max brightness and period values in compute_duty_cycle. This fixes it by using a 64 bit variable for computing the duty cycle. Signed-off-by: Derek Basehore Acked-by: Thierry Reding Reviewed-by: Brian Norris Signed-off-by: Lee Jones Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/video/backlight/pwm_bl.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c index 9bd17682655a..1c2289ddd555 100644 --- a/drivers/video/backlight/pwm_bl.c +++ b/drivers/video/backlight/pwm_bl.c @@ -79,14 +79,17 @@ static void pwm_backlight_power_off(struct pwm_bl_data *pb) static int compute_duty_cycle(struct pwm_bl_data *pb, int brightness) { unsigned int lth = pb->lth_brightness; - int duty_cycle; + u64 duty_cycle; if (pb->levels) duty_cycle = pb->levels[brightness]; else duty_cycle = brightness; - return (duty_cycle * (pb->period - lth) / pb->scale) + lth; + duty_cycle *= pb->period - lth; + do_div(duty_cycle, pb->scale); + + return duty_cycle + lth; } static int pwm_backlight_update_status(struct backlight_device *bl) From 5d583a7e2d92a723e9833604f760517a64f630cc Mon Sep 17 00:00:00 2001 From: Shashank Sharma Date: Thu, 12 Oct 2017 22:10:08 +0530 Subject: [PATCH 0656/1013] drm: Add retries for lspcon mode detection [ Upstream commit f687e25a7a245952349f1f9f9cc238ac5a3be258 ] >From the CI builds, its been observed that during a driver reload/insert, dp dual mode read function sometimes fails to read from LSPCON device over i2c-over-aux channel. This patch: - adds some delay and few retries, allowing a scope for these devices to settle down and respond. - changes one error log's level from ERROR->DEBUG as we want to call it an error only after all the retries are exhausted. V2: Addressed review comments from Jani (for loop for retry) V3: Addressed review comments from Imre (break on partial read too) V3: Addressed review comments from Ville/Imre (Add the retries exclusively for LSPCON, not for all dp_dual_mode devices) V4: Added r-b from Imre, sending it to dri-devel (Jani) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102294 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102295 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102359 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103186 Cc: Ville Syrjala Cc: Imre Deak Cc: Jani Nikula Reviewed-by: Imre Deak Acked-by: Dave Airlie Signed-off-by: Shashank Sharma Signed-off-by: Jani Nikula Link: https://patchwork.freedesktop.org/patch/msgid/1507826408-19322-1-git-send-email-shashank.sharma@intel.com Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/drm_dp_dual_mode_helper.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_dual_mode_helper.c b/drivers/gpu/drm/drm_dp_dual_mode_helper.c index 0ef9011a1856..02a50929af67 100644 --- a/drivers/gpu/drm/drm_dp_dual_mode_helper.c +++ b/drivers/gpu/drm/drm_dp_dual_mode_helper.c @@ -410,6 +410,7 @@ int drm_lspcon_get_mode(struct i2c_adapter *adapter, { u8 data; int ret = 0; + int retry; if (!mode) { DRM_ERROR("NULL input\n"); @@ -417,10 +418,19 @@ int drm_lspcon_get_mode(struct i2c_adapter *adapter, } /* Read Status: i2c over aux */ - ret = drm_dp_dual_mode_read(adapter, DP_DUAL_MODE_LSPCON_CURRENT_MODE, - &data, sizeof(data)); + for (retry = 0; retry < 6; retry++) { + if (retry) + usleep_range(500, 1000); + + ret = drm_dp_dual_mode_read(adapter, + DP_DUAL_MODE_LSPCON_CURRENT_MODE, + &data, sizeof(data)); + if (!ret) + break; + } + if (ret < 0) { - DRM_ERROR("LSPCON read(0x80, 0x41) failed\n"); + DRM_DEBUG_KMS("LSPCON read(0x80, 0x41) failed\n"); return -EFAULT; } From a7455b113feff0b99cad533c63bc6b877c1a1960 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Thu, 12 Oct 2017 16:36:58 +0800 Subject: [PATCH 0657/1013] clk: sunxi-ng: nm: Check if requested rate is supported by fractional clock [ Upstream commit 4cdbc40d64d4b8303a97e29a52862e4d99502beb ] The round_rate callback for N-M-factor style clocks does not check if the requested clock rate is supported by the fractional clock mode. While this doesn't affect usage in practice, since the clock rates are also supported through N-M factors, it does not match the set_rate code. Add a check to the round_rate callback so it matches the set_rate callback. Fixes: 6174a1e24b0d ("clk: sunxi-ng: Add N-M-factor clock support") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/sunxi-ng/ccu_nm.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/clk/sunxi-ng/ccu_nm.c b/drivers/clk/sunxi-ng/ccu_nm.c index a32158e8f2e3..84a5e7f17f6f 100644 --- a/drivers/clk/sunxi-ng/ccu_nm.c +++ b/drivers/clk/sunxi-ng/ccu_nm.c @@ -99,6 +99,9 @@ static long ccu_nm_round_rate(struct clk_hw *hw, unsigned long rate, struct ccu_nm *nm = hw_to_ccu_nm(hw); struct _ccu_nm _nm; + if (ccu_frac_helper_has_rate(&nm->common, &nm->frac, rate)) + return rate; + _nm.min_n = nm->n.min ?: 1; _nm.max_n = nm->n.max ?: 1 << nm->n.width; _nm.min_m = 1; From 9fe2989cdf3d433ab07d9e5979c5a811e26dd925 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Thu, 12 Oct 2017 16:36:57 +0800 Subject: [PATCH 0658/1013] clk: sunxi-ng: sun5i: Fix bit offset of audio PLL post-divider [ Upstream commit d51fe3ba9773c8b6fc79f82bbe75d64baf604292 ] The post-divider for the audio PLL is in bits [29:26], as specified in the user manual, not [19:16] as currently programmed in the code. The post-divider has a default register value of 2, i.e. a divider of 3. This means the clock rate fed to the audio codec would be off. This was discovered when porting sigma-delta modulation for the PLL to sun5i, which needs the post-divider to be 1. Fix the bit offset, so we do actually force the post-divider to a certain value. Fixes: 5e73761786d6 ("clk: sunxi-ng: Add sun5i CCU driver") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/sunxi-ng/ccu-sun5i.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/clk/sunxi-ng/ccu-sun5i.c b/drivers/clk/sunxi-ng/ccu-sun5i.c index ab9e850b3707..2f385a57cd91 100644 --- a/drivers/clk/sunxi-ng/ccu-sun5i.c +++ b/drivers/clk/sunxi-ng/ccu-sun5i.c @@ -982,8 +982,8 @@ static void __init sun5i_ccu_init(struct device_node *node, /* Force the PLL-Audio-1x divider to 4 */ val = readl(reg + SUN5I_PLL_AUDIO_REG); - val &= ~GENMASK(19, 16); - writel(val | (3 << 16), reg + SUN5I_PLL_AUDIO_REG); + val &= ~GENMASK(29, 26); + writel(val | (3 << 26), reg + SUN5I_PLL_AUDIO_REG); /* * Use the peripheral PLL as the AHB parent, instead of CPU / From 714abd2d6996bb36b334dd15c6c52d3ad00766a0 Mon Sep 17 00:00:00 2001 From: Christian Lamparter Date: Wed, 4 Oct 2017 01:00:08 +0200 Subject: [PATCH 0659/1013] crypto: crypto4xx - increase context and scatter ring buffer elements [ Upstream commit 778f81d6cdb7d25360f082ac0384d5103f04eca5 ] If crypto4xx is used in conjunction with dm-crypt, the available ring buffer elements are not enough to handle the load properly. On an aes-cbc-essiv:sha256 encrypted swap partition the read performance is abyssal: (tested with hdparm -t) /dev/mapper/swap_crypt: Timing buffered disk reads: 14 MB in 3.68 seconds = 3.81 MB/sec The patch increases both PPC4XX_NUM_SD and PPC4XX_NUM_PD to 256. This improves the performance considerably: /dev/mapper/swap_crypt: Timing buffered disk reads: 104 MB in 3.03 seconds = 34.31 MB/sec Furthermore, PPC4XX_LAST_SD, PPC4XX_LAST_GD and PPC4XX_LAST_PD can be easily calculated from their respective PPC4XX_NUM_* constant. Signed-off-by: Christian Lamparter Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/amcc/crypto4xx_core.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/crypto/amcc/crypto4xx_core.h b/drivers/crypto/amcc/crypto4xx_core.h index ecfdcfe3698d..4f41d6da5acc 100644 --- a/drivers/crypto/amcc/crypto4xx_core.h +++ b/drivers/crypto/amcc/crypto4xx_core.h @@ -34,12 +34,12 @@ #define PPC405EX_CE_RESET 0x00000008 #define CRYPTO4XX_CRYPTO_PRIORITY 300 -#define PPC4XX_LAST_PD 63 -#define PPC4XX_NUM_PD 64 -#define PPC4XX_LAST_GD 1023 +#define PPC4XX_NUM_PD 256 +#define PPC4XX_LAST_PD (PPC4XX_NUM_PD - 1) #define PPC4XX_NUM_GD 1024 -#define PPC4XX_LAST_SD 63 -#define PPC4XX_NUM_SD 64 +#define PPC4XX_LAST_GD (PPC4XX_NUM_GD - 1) +#define PPC4XX_NUM_SD 256 +#define PPC4XX_LAST_SD (PPC4XX_NUM_SD - 1) #define PPC4XX_SD_BUFFER_SIZE 2048 #define PD_ENTRY_INUSE 1 From bd5139895727bcf97a411bf17fc137fb0d57f55c Mon Sep 17 00:00:00 2001 From: Christophe Jaillet Date: Sun, 8 Oct 2017 11:39:49 +0200 Subject: [PATCH 0660/1013] crypto: lrw - Fix an error handling path in 'create()' [ Upstream commit 616129cc6e75fb4da6681c16c981fa82dfe5e4c7 ] All error handling paths 'goto err_drop_spawn' except this one. In order to avoid some resources leak, we should do it as well here. Fixes: 700cb3f5fe75 ("crypto: lrw - Convert to skcipher") Signed-off-by: Christophe JAILLET Signed-off-by: Herbert Xu Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- crypto/lrw.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/crypto/lrw.c b/crypto/lrw.c index a8bfae4451bf..eb681e9fe574 100644 --- a/crypto/lrw.c +++ b/crypto/lrw.c @@ -610,8 +610,10 @@ static int create(struct crypto_template *tmpl, struct rtattr **tb) ecb_name[len - 1] = 0; if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, - "lrw(%s)", ecb_name) >= CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; + "lrw(%s)", ecb_name) >= CRYPTO_MAX_ALG_NAME) { + err = -ENAMETOOLONG; + goto err_drop_spawn; + } } inst->alg.base.cra_flags = alg->base.cra_flags & CRYPTO_ALG_ASYNC; From af826fdfb14c51fc4416993d0fcef8f4aa43b54c Mon Sep 17 00:00:00 2001 From: Russell King Date: Fri, 29 Sep 2017 11:22:15 +0100 Subject: [PATCH 0661/1013] rtc: pl031: make interrupt optional [ Upstream commit 5b64a2965dfdfca8039e93303c64e2b15c19ff0c ] On some platforms, the interrupt for the PL031 is optional. Avoid trying to claim the interrupt if it's not specified. Reviewed-by: Linus Walleij Signed-off-by: Russell King Signed-off-by: Alexandre Belloni Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-pl031.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/rtc/rtc-pl031.c b/drivers/rtc/rtc-pl031.c index e1687e19c59f..a30f24cb6c83 100644 --- a/drivers/rtc/rtc-pl031.c +++ b/drivers/rtc/rtc-pl031.c @@ -308,7 +308,8 @@ static int pl031_remove(struct amba_device *adev) dev_pm_clear_wake_irq(&adev->dev); device_init_wakeup(&adev->dev, false); - free_irq(adev->irq[0], ldata); + if (adev->irq[0]) + free_irq(adev->irq[0], ldata); rtc_device_unregister(ldata->rtc); iounmap(ldata->base); kfree(ldata); @@ -381,12 +382,13 @@ static int pl031_probe(struct amba_device *adev, const struct amba_id *id) goto out_no_rtc; } - if (request_irq(adev->irq[0], pl031_interrupt, - vendor->irqflags, "rtc-pl031", ldata)) { - ret = -EIO; - goto out_no_irq; + if (adev->irq[0]) { + ret = request_irq(adev->irq[0], pl031_interrupt, + vendor->irqflags, "rtc-pl031", ldata); + if (ret) + goto out_no_irq; + dev_pm_set_wake_irq(&adev->dev, adev->irq[0]); } - dev_pm_set_wake_irq(&adev->dev, adev->irq[0]); return 0; out_no_irq: From f3a68b4b82f3f19b80b84cf11753082e7fb6ec8f Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Thu, 5 Oct 2017 18:07:24 -0700 Subject: [PATCH 0662/1013] kvm, mm: account kvm related kmem slabs to kmemcg [ Upstream commit 46bea48ac241fe0b413805952dda74dd0c09ba8b ] The kvm slabs can consume a significant amount of system memory and indeed in our production environment we have observed that a lot of machines are spending significant amount of memory that can not be left as system memory overhead. Also the allocations from these slabs can be triggered directly by user space applications which has access to kvm and thus a buggy application can leak such memory. So, these caches should be accounted to kmemcg. Signed-off-by: Shakeel Butt Signed-off-by: Paolo Bonzini Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/mmu.c | 4 ++-- virt/kvm/kvm_main.c | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7a69cf053711..13ebeedcec07 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -5476,13 +5476,13 @@ int kvm_mmu_module_init(void) pte_list_desc_cache = kmem_cache_create("pte_list_desc", sizeof(struct pte_list_desc), - 0, 0, NULL); + 0, SLAB_ACCOUNT, NULL); if (!pte_list_desc_cache) goto nomem; mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header", sizeof(struct kvm_mmu_page), - 0, 0, NULL); + 0, SLAB_ACCOUNT, NULL); if (!mmu_page_header_cache) goto nomem; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 484e8820c382..2447d7c017e7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4018,7 +4018,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, if (!vcpu_align) vcpu_align = __alignof__(struct kvm_vcpu); kvm_vcpu_cache = kmem_cache_create("kvm_vcpu", vcpu_size, vcpu_align, - 0, NULL); + SLAB_ACCOUNT, NULL); if (!kvm_vcpu_cache) { r = -ENOMEM; goto out_free_3; From a5171fe705fb31bf3a0d93e8d69285d2b5b691c9 Mon Sep 17 00:00:00 2001 From: Dan Murphy Date: Tue, 10 Oct 2017 12:42:56 -0500 Subject: [PATCH 0663/1013] net: phy: at803x: Change error to EINVAL for invalid MAC [ Upstream commit fc7556877d1748ac00958822a0a3bba1d4bd9e0d ] Change the return error code to EINVAL if the MAC address is not valid in the set_wol function. Signed-off-by: Dan Murphy Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/at803x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/at803x.c b/drivers/net/phy/at803x.c index c1e52b9dc58d..5f93e6add563 100644 --- a/drivers/net/phy/at803x.c +++ b/drivers/net/phy/at803x.c @@ -167,7 +167,7 @@ static int at803x_set_wol(struct phy_device *phydev, mac = (const u8 *) ndev->dev_addr; if (!is_valid_ether_addr(mac)) - return -EFAULT; + return -EINVAL; for (i = 0; i < 3; i++) { phy_write(phydev, AT803X_MMD_ACCESS_CONTROL, From bfd66a406fe7e590055c1d6714adc697f18664c8 Mon Sep 17 00:00:00 2001 From: David Daney Date: Fri, 8 Sep 2017 10:10:31 +0200 Subject: [PATCH 0664/1013] PCI: Avoid bus reset if bridge itself is broken [ Upstream commit 357027786f3523d26f42391aa4c075b8495e5d28 ] When checking to see if a PCI bus can safely be reset, we previously checked to see if any of the children had their PCI_DEV_FLAGS_NO_BUS_RESET flag set. Children marked with that flag are known not to behave well after a bus reset. Some PCIe root port bridges also do not behave well after a bus reset, sometimes causing the devices behind the bridge to become unusable. Add a check for PCI_DEV_FLAGS_NO_BUS_RESET being set in the bridge device to allow these bridges to be flagged, and prevent their secondary buses from being reset. Signed-off-by: David Daney [jglauber@cavium.com: fixed typo] Signed-off-by: Jan Glauber Signed-off-by: Bjorn Helgaas Reviewed-by: Alex Williamson Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pci.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 6078dfc11b11..74f1c57ab93b 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4356,6 +4356,10 @@ static bool pci_bus_resetable(struct pci_bus *bus) { struct pci_dev *dev; + + if (bus->self && (bus->self->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)) + return false; + list_for_each_entry(dev, &bus->devices, bus_list) { if (dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET || (dev->subordinate && !pci_bus_resetable(dev->subordinate))) From 3aaaf02c110f36dc716780412add4799b2a89e5b Mon Sep 17 00:00:00 2001 From: Varun Prakash Date: Wed, 11 Oct 2017 19:33:07 +0530 Subject: [PATCH 0665/1013] scsi: cxgb4i: fix Tx skb leak [ Upstream commit 9b3a081fb62158b50bcc90522ca2423017544367 ] In case of connection reset Tx skb queue can have some skbs which are not transmitted so purge Tx skb queue in release_offload_resources() to avoid skb leak. Signed-off-by: Varun Prakash Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/cxgbi/cxgb4i/cxgb4i.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c index 1d02cf9fe06c..30d5f0ef29bb 100644 --- a/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c +++ b/drivers/scsi/cxgbi/cxgb4i/cxgb4i.c @@ -1575,6 +1575,7 @@ static void release_offload_resources(struct cxgbi_sock *csk) csk, csk->state, csk->flags, csk->tid); cxgbi_sock_free_cpl_skbs(csk); + cxgbi_sock_purge_write_queue(csk); if (csk->wr_cred != csk->wr_max_cred) { cxgbi_sock_purge_wr_queue(csk); cxgbi_sock_reset_wr_list(csk); From 6d95d05bafbafc88a13d81e7ad63fcb899de499d Mon Sep 17 00:00:00 2001 From: Sreekanth Reddy Date: Tue, 10 Oct 2017 18:41:18 +0530 Subject: [PATCH 0666/1013] scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive [ Upstream commit 2ce9a3645299ba1752873d333d73f67620f4550b ] Whenever an I/O for a RAID volume fails with IOCStatus MPI2_IOCSTATUS_SCSI_IOC_TERMINATED and SCSIStatus equal to (MPI2_SCSI_STATE_TERMINATED | MPI2_SCSI_STATE_NO_SCSI_STATUS) then return the I/O to SCSI midlayer with "DID_RESET" (i.e. retry the IO infinite times) set in the host byte. Previously, the driver was completing the I/O with "DID_SOFT_ERROR" which causes the I/O to be quickly retried. However, firmware needed more time and hence I/Os were failing. Signed-off-by: Sreekanth Reddy Reviewed-by: Tomas Henzl Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c index 22998cbd538f..33ff691878e2 100644 --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -4804,6 +4804,11 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply) } else if (log_info == VIRTUAL_IO_FAILED_RETRY) { scmd->result = DID_RESET << 16; break; + } else if ((scmd->device->channel == RAID_CHANNEL) && + (scsi_state == (MPI2_SCSI_STATE_TERMINATED | + MPI2_SCSI_STATE_NO_SCSI_STATUS))) { + scmd->result = DID_RESET << 16; + break; } scmd->result = DID_SOFT_ERROR << 16; break; From 7af9f9cd68c7b6f009c4a0c0d8ea7703aa46a26b Mon Sep 17 00:00:00 2001 From: Stuart Hayes Date: Wed, 4 Oct 2017 10:57:52 -0500 Subject: [PATCH 0667/1013] PCI: Create SR-IOV virtfn/physfn links before attaching driver [ Upstream commit 27d6162944b9b34c32cd5841acd21786637ee743 ] When creating virtual functions, create the "virtfn%u" and "physfn" links in sysfs *before* attaching the driver instead of after. When we attach the driver to the new virtual network interface first, there is a race when the driver attaches to the new sends out an "add" udev event, and the network interface naming software (biosdevname or systemd, for example) tries to look at these links. Signed-off-by: Stuart Hayes Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/iov.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index ac41c8be9200..0fd8e164339c 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -162,7 +162,6 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset) pci_device_add(virtfn, virtfn->bus); - pci_bus_add_device(virtfn); sprintf(buf, "virtfn%u", id); rc = sysfs_create_link(&dev->dev.kobj, &virtfn->dev.kobj, buf); if (rc) @@ -173,6 +172,8 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id, int reset) kobject_uevent(&virtfn->dev.kobj, KOBJ_CHANGE); + pci_bus_add_device(virtfn); + return 0; failed2: From 349384cd7028affdb23634b56d10da015b080cba Mon Sep 17 00:00:00 2001 From: Fabio Estevam Date: Fri, 29 Sep 2017 14:39:49 -0300 Subject: [PATCH 0668/1013] PM / OPP: Move error message to debug level [ Upstream commit 035ed07208dc501d023873447113f3f178592156 ] On some i.MX6 platforms which do not have speed grading check, opp table will not be created in platform code, so cpufreq driver prints the following error message: cpu cpu0: dev_pm_opp_get_opp_count: OPP table not found (-19) However, this is not really an error in this case because the imx6q-cpufreq driver first calls dev_pm_opp_get_opp_count() and if it fails, it means that platform code does not provide OPP and then dev_pm_opp_of_add_table() will be called. In order to avoid such confusing error message, move it to debug level. It is up to the caller of dev_pm_opp_get_opp_count() to check its return value and decide if it will print an error or not. Signed-off-by: Fabio Estevam Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/base/power/opp/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/base/power/opp/core.c b/drivers/base/power/opp/core.c index a6de32530693..0459b1204694 100644 --- a/drivers/base/power/opp/core.c +++ b/drivers/base/power/opp/core.c @@ -296,7 +296,7 @@ int dev_pm_opp_get_opp_count(struct device *dev) opp_table = _find_opp_table(dev); if (IS_ERR(opp_table)) { count = PTR_ERR(opp_table); - dev_err(dev, "%s: OPP table not found (%d)\n", + dev_dbg(dev, "%s: OPP table not found (%d)\n", __func__, count); return count; } From 66efe26b0b074ef11575b883fc9a93af51aab597 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 27 Aug 2017 08:39:51 +0200 Subject: [PATCH 0669/1013] igb: check memory allocation failure [ Upstream commit 18eb86362a52f0af933cc0fd5e37027317eb2d1c ] Check memory allocation failures and return -ENOMEM in such cases, as already done for other memory allocations in this function. This avoids NULL pointers dereference. Signed-off-by: Christophe JAILLET Tested-by: Aaron Brown Acked-by: PJ Waskiewicz Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/igb/igb_main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index b0031c5ff767..667dbc7d4a4e 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -3162,6 +3162,8 @@ static int igb_sw_init(struct igb_adapter *adapter) /* Setup and initialize a copy of the hw vlan table array */ adapter->shadow_vfta = kcalloc(E1000_VLAN_FILTER_TBL_SIZE, sizeof(u32), GFP_ATOMIC); + if (!adapter->shadow_vfta) + return -ENOMEM; /* This call may decrease the number of queues */ if (igb_init_interrupt_scheme(adapter, true)) { From c817cb56b8d62cf2677988cf5fa70cc515a526e1 Mon Sep 17 00:00:00 2001 From: Lihong Yang Date: Thu, 7 Sep 2017 08:05:46 -0400 Subject: [PATCH 0670/1013] i40e: use the safe hash table iterator when deleting mac filters [ Upstream commit 784548c40d6f43eff2297220ad7800dc04be03c6 ] This patch replaces hash_for_each function with hash_for_each_safe when calling __i40e_del_filter. The hash_for_each_safe function is the right one to use when iterating over a hash table to safely remove a hash entry. Otherwise, incorrect values may be read from freed memory. Detected by CoverityScan, CID 1402048 Read from pointer after free Signed-off-by: Lihong Yang Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 4d1e670f490e..fa25ea512e43 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -2779,6 +2779,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) struct i40e_mac_filter *f; struct i40e_vf *vf; int ret = 0; + struct hlist_node *h; int bkt; /* validate the request */ @@ -2817,7 +2818,7 @@ int i40e_ndo_set_vf_mac(struct net_device *netdev, int vf_id, u8 *mac) /* Delete all the filters for this VSI - we're going to kill it * anyway. */ - hash_for_each(vsi->mac_filter_hash, bkt, f, hlist) + hash_for_each_safe(vsi->mac_filter_hash, bkt, h, f, hlist) __i40e_del_filter(vsi, f); spin_unlock_bh(&vsi->mac_filter_hash_lock); From d1f13dcad56bdd655a29f33b5ac84e0fae48a4eb Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Wed, 30 Aug 2017 13:50:39 +0200 Subject: [PATCH 0671/1013] iio: st_sensors: add register mask for status register [ Upstream commit e72a060151e5bb673af24993665e270fc4f674a7 ] Introduce register mask for data-ready status register since pressure sensors (e.g. LPS22HB) export just two channels (BIT(0) and BIT(1)) and BIT(2) is marked reserved while in st_sensors_new_samples_available() value read from status register is masked using 0x7. Moreover do not mask status register using active_scan_mask since now status value is properly masked and if the result is not zero the interrupt has to be consumed by the driver. This fix an issue on LPS25H and LPS331AP where channel definition is swapped respect to status register. Furthermore that change allows to properly support new devices (e.g LIS2DW12) that report just ZYXDA (data-ready) field in status register to figure out if the interrupt has been generated by the device. Fixes: 97865fe41322 (iio: st_sensors: verify interrupt event to status) Signed-off-by: Lorenzo Bianconi Reviewed-by: Linus Walleij Signed-off-by: Jonathan Cameron Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/iio/accel/st_accel_core.c | 35 +++++++++++++++---- .../iio/common/st_sensors/st_sensors_core.c | 2 +- .../common/st_sensors/st_sensors_trigger.c | 16 +++------ drivers/iio/gyro/st_gyro_core.c | 15 ++++++-- drivers/iio/magnetometer/st_magn_core.c | 10 ++++-- drivers/iio/pressure/st_pressure_core.c | 15 ++++++-- include/linux/iio/common/st_sensors.h | 7 ++-- 7 files changed, 70 insertions(+), 30 deletions(-) diff --git a/drivers/iio/accel/st_accel_core.c b/drivers/iio/accel/st_accel_core.c index 752856b3a849..379de1829cdb 100644 --- a/drivers/iio/accel/st_accel_core.c +++ b/drivers/iio/accel/st_accel_core.c @@ -164,7 +164,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { .mask_int2 = 0x00, .addr_ihl = 0x25, .mask_ihl = 0x02, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .sim = { .addr = 0x23, @@ -236,7 +239,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { .mask_ihl = 0x80, .addr_od = 0x22, .mask_od = 0x40, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .sim = { .addr = 0x23, @@ -318,7 +324,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { .mask_int2 = 0x00, .addr_ihl = 0x23, .mask_ihl = 0x40, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, .ig1 = { .en_addr = 0x23, .en_mask = 0x08, @@ -389,7 +398,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { .drdy_irq = { .addr = 0x21, .mask_int1 = 0x04, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .sim = { .addr = 0x21, @@ -451,7 +463,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { .mask_ihl = 0x80, .addr_od = 0x22, .mask_od = 0x40, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .sim = { .addr = 0x21, @@ -569,7 +584,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { .drdy_irq = { .addr = 0x21, .mask_int1 = 0x04, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .sim = { .addr = 0x21, @@ -640,7 +658,10 @@ static const struct st_sensor_settings st_accel_sensors_settings[] = { .mask_int2 = 0x00, .addr_ihl = 0x25, .mask_ihl = 0x02, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .sim = { .addr = 0x23, diff --git a/drivers/iio/common/st_sensors/st_sensors_core.c b/drivers/iio/common/st_sensors/st_sensors_core.c index 02e833b14db0..34115f05d5c4 100644 --- a/drivers/iio/common/st_sensors/st_sensors_core.c +++ b/drivers/iio/common/st_sensors/st_sensors_core.c @@ -470,7 +470,7 @@ int st_sensors_set_dataready_irq(struct iio_dev *indio_dev, bool enable) * different one. Take into account irq status register * to understand if irq trigger can be properly supported */ - if (sdata->sensor_settings->drdy_irq.addr_stat_drdy) + if (sdata->sensor_settings->drdy_irq.stat_drdy.addr) sdata->hw_irq_trigger = enable; return 0; } diff --git a/drivers/iio/common/st_sensors/st_sensors_trigger.c b/drivers/iio/common/st_sensors/st_sensors_trigger.c index fa73e6795359..fdcc5a891958 100644 --- a/drivers/iio/common/st_sensors/st_sensors_trigger.c +++ b/drivers/iio/common/st_sensors/st_sensors_trigger.c @@ -31,7 +31,7 @@ static int st_sensors_new_samples_available(struct iio_dev *indio_dev, int ret; /* How would I know if I can't check it? */ - if (!sdata->sensor_settings->drdy_irq.addr_stat_drdy) + if (!sdata->sensor_settings->drdy_irq.stat_drdy.addr) return -EINVAL; /* No scan mask, no interrupt */ @@ -39,23 +39,15 @@ static int st_sensors_new_samples_available(struct iio_dev *indio_dev, return 0; ret = sdata->tf->read_byte(&sdata->tb, sdata->dev, - sdata->sensor_settings->drdy_irq.addr_stat_drdy, + sdata->sensor_settings->drdy_irq.stat_drdy.addr, &status); if (ret < 0) { dev_err(sdata->dev, "error checking samples available\n"); return ret; } - /* - * the lower bits of .active_scan_mask[0] is directly mapped - * to the channels on the sensor: either bit 0 for - * one-dimensional sensors, or e.g. x,y,z for accelerometers, - * gyroscopes or magnetometers. No sensor use more than 3 - * channels, so cut the other status bits here. - */ - status &= 0x07; - if (status & (u8)indio_dev->active_scan_mask[0]) + if (status & sdata->sensor_settings->drdy_irq.stat_drdy.mask) return 1; return 0; @@ -212,7 +204,7 @@ int st_sensors_allocate_trigger(struct iio_dev *indio_dev, * it was "our" interrupt. */ if (sdata->int_pin_open_drain && - sdata->sensor_settings->drdy_irq.addr_stat_drdy) + sdata->sensor_settings->drdy_irq.stat_drdy.addr) irq_trig |= IRQF_SHARED; err = request_threaded_irq(sdata->get_irq_data_ready(indio_dev), diff --git a/drivers/iio/gyro/st_gyro_core.c b/drivers/iio/gyro/st_gyro_core.c index e366422e8512..2536a8400c98 100644 --- a/drivers/iio/gyro/st_gyro_core.c +++ b/drivers/iio/gyro/st_gyro_core.c @@ -118,7 +118,10 @@ static const struct st_sensor_settings st_gyro_sensors_settings[] = { * drain settings, but only for INT1 and not * for the DRDY line on INT2. */ - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .multi_read_bit = true, .bootime = 2, @@ -188,7 +191,10 @@ static const struct st_sensor_settings st_gyro_sensors_settings[] = { * drain settings, but only for INT1 and not * for the DRDY line on INT2. */ - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .multi_read_bit = true, .bootime = 2, @@ -253,7 +259,10 @@ static const struct st_sensor_settings st_gyro_sensors_settings[] = { * drain settings, but only for INT1 and not * for the DRDY line on INT2. */ - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .multi_read_bit = true, .bootime = 2, diff --git a/drivers/iio/magnetometer/st_magn_core.c b/drivers/iio/magnetometer/st_magn_core.c index 08aafba4481c..19031a7bce23 100644 --- a/drivers/iio/magnetometer/st_magn_core.c +++ b/drivers/iio/magnetometer/st_magn_core.c @@ -317,7 +317,10 @@ static const struct st_sensor_settings st_magn_sensors_settings[] = { }, .drdy_irq = { /* drdy line is routed drdy pin */ - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x07, + }, }, .multi_read_bit = true, .bootime = 2, @@ -361,7 +364,10 @@ static const struct st_sensor_settings st_magn_sensors_settings[] = { .drdy_irq = { .addr = 0x62, .mask_int1 = 0x01, - .addr_stat_drdy = 0x67, + .stat_drdy = { + .addr = 0x67, + .mask = 0x07, + }, }, .multi_read_bit = false, .bootime = 2, diff --git a/drivers/iio/pressure/st_pressure_core.c b/drivers/iio/pressure/st_pressure_core.c index 34611a8ea2ce..ea075fcd5a6f 100644 --- a/drivers/iio/pressure/st_pressure_core.c +++ b/drivers/iio/pressure/st_pressure_core.c @@ -287,7 +287,10 @@ static const struct st_sensor_settings st_press_sensors_settings[] = { .mask_ihl = 0x80, .addr_od = 0x22, .mask_od = 0x40, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x03, + }, }, .multi_read_bit = true, .bootime = 2, @@ -395,7 +398,10 @@ static const struct st_sensor_settings st_press_sensors_settings[] = { .mask_ihl = 0x80, .addr_od = 0x22, .mask_od = 0x40, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x03, + }, }, .multi_read_bit = true, .bootime = 2, @@ -454,7 +460,10 @@ static const struct st_sensor_settings st_press_sensors_settings[] = { .mask_ihl = 0x80, .addr_od = 0x12, .mask_od = 0x40, - .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR, + .stat_drdy = { + .addr = ST_SENSORS_DEFAULT_STAT_ADDR, + .mask = 0x03, + }, }, .multi_read_bit = false, .bootime = 2, diff --git a/include/linux/iio/common/st_sensors.h b/include/linux/iio/common/st_sensors.h index 7b0fa8b5c120..ce0ef1c0a30a 100644 --- a/include/linux/iio/common/st_sensors.h +++ b/include/linux/iio/common/st_sensors.h @@ -139,7 +139,7 @@ struct st_sensor_das { * @mask_ihl: mask to enable/disable active low on the INT lines. * @addr_od: address to enable/disable Open Drain on the INT lines. * @mask_od: mask to enable/disable Open Drain on the INT lines. - * @addr_stat_drdy: address to read status of DRDY (data ready) interrupt + * struct stat_drdy - status register of DRDY (data ready) interrupt. * struct ig1 - represents the Interrupt Generator 1 of sensors. * @en_addr: address of the enable ig1 register. * @en_mask: mask to write the on/off value for enable. @@ -152,7 +152,10 @@ struct st_sensor_data_ready_irq { u8 mask_ihl; u8 addr_od; u8 mask_od; - u8 addr_stat_drdy; + struct { + u8 addr; + u8 mask; + } stat_drdy; struct { u8 en_addr; u8 en_mask; From afccf6f360df511a432d4c2b99f1eb8d9ac369e6 Mon Sep 17 00:00:00 2001 From: Emil Tantilov Date: Mon, 11 Sep 2017 14:21:31 -0700 Subject: [PATCH 0672/1013] ixgbe: fix use of uninitialized padding [ Upstream commit dcfd6b839c998bc9838e2a47f44f37afbdf3099c ] This patch is resolving Coverity hits where padding in a structure could be used uninitialized. - Initialize fwd_cmd.pad/2 before ixgbe_calculate_checksum() - Initialize buffer.pad2/3 before ixgbe_hic_unlocked() Signed-off-by: Emil Tantilov Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 4 ++-- drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c index 6e6ab6f6875e..64429a14c630 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c @@ -3781,10 +3781,10 @@ s32 ixgbe_set_fw_drv_ver_generic(struct ixgbe_hw *hw, u8 maj, u8 min, fw_cmd.ver_build = build; fw_cmd.ver_sub = sub; fw_cmd.hdr.checksum = 0; - fw_cmd.hdr.checksum = ixgbe_calculate_checksum((u8 *)&fw_cmd, - (FW_CEM_HDR_LEN + fw_cmd.hdr.buf_len)); fw_cmd.pad = 0; fw_cmd.pad2 = 0; + fw_cmd.hdr.checksum = ixgbe_calculate_checksum((u8 *)&fw_cmd, + (FW_CEM_HDR_LEN + fw_cmd.hdr.buf_len)); for (i = 0; i <= FW_CEM_MAX_RETRIES; i++) { ret_val = ixgbe_host_interface_command(hw, &fw_cmd, diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c index 19fbb2f28ea4..8a85217845ae 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c @@ -900,6 +900,8 @@ static s32 ixgbe_read_ee_hostif_buffer_X550(struct ixgbe_hw *hw, /* convert offset from words to bytes */ buffer.address = cpu_to_be32((offset + current_word) * 2); buffer.length = cpu_to_be16(words_to_read * 2); + buffer.pad2 = 0; + buffer.pad3 = 0; status = ixgbe_hic_unlocked(hw, (u32 *)&buffer, sizeof(buffer), IXGBE_HI_COMMAND_TIMEOUT); From afdbec5d3c652e8d731861c460eee96ef9498315 Mon Sep 17 00:00:00 2001 From: Colin Ian King Date: Fri, 8 Sep 2017 15:37:45 +0100 Subject: [PATCH 0673/1013] IB/rxe: check for allocation failure on elem [ Upstream commit 4831ca9e4a8e48cb27e0a792f73250390827a228 ] The allocation for elem may fail (especially because we're using GFP_ATOMIC) so best to check for a null return. This fixes a potential null pointer dereference when assigning elem->pool. Detected by CoverityScan CID#1357507 ("Dereference null return value") Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Colin Ian King Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/sw/rxe/rxe_pool.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/infiniband/sw/rxe/rxe_pool.c b/drivers/infiniband/sw/rxe/rxe_pool.c index c1b5f38f31a5..3b4916680018 100644 --- a/drivers/infiniband/sw/rxe/rxe_pool.c +++ b/drivers/infiniband/sw/rxe/rxe_pool.c @@ -404,6 +404,8 @@ void *rxe_alloc(struct rxe_pool *pool) elem = kmem_cache_zalloc(pool_cache(pool), (pool->flags & RXE_POOL_ATOMIC) ? GFP_ATOMIC : GFP_KERNEL); + if (!elem) + return NULL; elem->pool = pool; kref_init(&elem->ref_cnt); From 7535afccf97c9dd1559e0b2d7586a522020e5522 Mon Sep 17 00:00:00 2001 From: Luca Miccio Date: Mon, 9 Oct 2017 16:27:21 +0200 Subject: [PATCH 0674/1013] block,bfq: Disable writeback throttling [ Upstream commit b5dc5d4d1f4ff9032eb6c21a3c571a1317dc9289 ] Similarly to CFQ, BFQ has its write-throttling heuristics, and it is better not to combine them with further write-throttling heuristics of a different nature. So this commit disables write-back throttling for a device if BFQ is used as I/O scheduler for that device. Signed-off-by: Luca Miccio Signed-off-by: Paolo Valente Tested-by: Oleksandr Natalenko Tested-by: Lee Tibbert Signed-off-by: Jens Axboe Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- block/bfq-iosched.c | 3 ++- block/blk-wbt.c | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index a4783da90ba8..0f860cf0d56d 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -108,6 +108,7 @@ #include "blk-mq-tag.h" #include "blk-mq-sched.h" #include "bfq-iosched.h" +#include "blk-wbt.h" #define BFQ_BFQQ_FNS(name) \ void bfq_mark_bfqq_##name(struct bfq_queue *bfqq) \ @@ -4775,7 +4776,7 @@ static int bfq_init_queue(struct request_queue *q, struct elevator_type *e) bfq_init_root_group(bfqd->root_group, bfqd); bfq_init_entity(&bfqd->oom_bfqq.entity, bfqd->root_group); - + wbt_disable_default(q); return 0; out_free: diff --git a/block/blk-wbt.c b/block/blk-wbt.c index 6a9a0f03a67b..e59d59c11ebb 100644 --- a/block/blk-wbt.c +++ b/block/blk-wbt.c @@ -654,7 +654,7 @@ void wbt_set_write_cache(struct rq_wb *rwb, bool write_cache_on) } /* - * Disable wbt, if enabled by default. Only called from CFQ. + * Disable wbt, if enabled by default. */ void wbt_disable_default(struct request_queue *q) { From 6d7bdad132d59c429dc340f8d10a4977e58813b0 Mon Sep 17 00:00:00 2001 From: Guoqing Jiang Date: Mon, 9 Oct 2017 10:32:48 +0800 Subject: [PATCH 0675/1013] md: always set THREAD_WAKEUP and wake up wqueue if thread existed [ Upstream commit d1d90147c9680aaec4a5757932c2103c42c9c23b ] Since commit 4ad23a976413 ("MD: use per-cpu counter for writes_pending"), the wait_queue is only got invoked if THREAD_WAKEUP is not set previously. With above change, I can see process_metadata_update could always hang on the wait queue, because mddev->thread could stay on 'D' status and the THREAD_WAKEUP flag is not cleared since there are lots of place to wake up mddev->thread. Then deadlock happened as follows: linux175:~ # ps aux|grep md|grep D root 20117 0.0 0.0 0 0 ? D 03:45 0:00 [md0_raid1] root 20125 0.0 0.0 0 0 ? D 03:45 0:00 [md0_cluster_rec] linux175:~ # cat /proc/20117/stack [] dlm_lock_sync+0x94/0xd0 [md_cluster] [] lock_token+0x34/0xd0 [md_cluster] [] metadata_update_start+0x64/0x110 [md_cluster] [] md_update_sb.part.58+0x9b/0x860 [md_mod] [] md_update_sb+0x15/0x30 [md_mod] [] md_check_recovery+0x266/0x490 [md_mod] [] raid1d+0x42/0x810 [raid1] [] md_thread+0x122/0x150 [md_mod] [] kthread+0x101/0x140 linux175:~ # cat /proc/20125/stack [] recv_daemon+0x3f9/0x5c0 [md_cluster] [] md_thread+0x122/0x150 [md_mod] [] kthread+0x101/0x140 So let's revert the part of code in the commit to resovle the problem since we can't get lots of benefits of previous change. Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending") Signed-off-by: Guoqing Jiang Signed-off-by: Shaohua Li Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/md/md.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 98ea86309ceb..6bf093cef958 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7468,8 +7468,8 @@ void md_wakeup_thread(struct md_thread *thread) { if (thread) { pr_debug("md: waking up MD thread %s.\n", thread->tsk->comm); - if (!test_and_set_bit(THREAD_WAKEUP, &thread->flags)) - wake_up(&thread->wqueue); + set_bit(THREAD_WAKEUP, &thread->flags); + wake_up(&thread->wqueue); } } EXPORT_SYMBOL(md_wakeup_thread); From 2f48fc1742a2da6196482695ec3bcd2438d24a13 Mon Sep 17 00:00:00 2001 From: William Tu Date: Thu, 5 Oct 2017 12:07:12 -0700 Subject: [PATCH 0676/1013] ip_gre: check packet length and mtu correctly in erspan tx [ Upstream commit f192970de860d3ab90aa9e2a22853201a57bde78 ] Similarly to early patch for erspan_xmit(), the ARPHDR_ETHER device is the length of the whole ether packet. So skb->len should subtract the dev->hard_header_len. Fixes: 1a66a836da63 ("gre: add collect_md mode to ERSPAN tunnel") Fixes: 84e54fe0a5ea ("gre: introduce native tunnel support for ERSPAN") Signed-off-by: William Tu Cc: Xin Long Cc: David Laight Reviewed-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/ip_gre.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c index 467e44d7587d..045331204097 100644 --- a/net/ipv4/ip_gre.c +++ b/net/ipv4/ip_gre.c @@ -579,8 +579,8 @@ static void erspan_fb_xmit(struct sk_buff *skb, struct net_device *dev, if (gre_handle_offloads(skb, false)) goto err_free_rt; - if (skb->len > dev->mtu) { - pskb_trim(skb, dev->mtu); + if (skb->len > dev->mtu + dev->hard_header_len) { + pskb_trim(skb, dev->mtu + dev->hard_header_len); truncate = true; } @@ -731,8 +731,8 @@ static netdev_tx_t erspan_xmit(struct sk_buff *skb, if (skb_cow_head(skb, dev->needed_headroom)) goto free_skb; - if (skb->len - dev->hard_header_len > dev->mtu) { - pskb_trim(skb, dev->mtu); + if (skb->len > dev->mtu + dev->hard_header_len) { + pskb_trim(skb, dev->mtu + dev->hard_header_len); truncate = true; } From 9704f8147e88213f2fa580f713b42b08a4f1a7d2 Mon Sep 17 00:00:00 2001 From: Wei Wang Date: Fri, 6 Oct 2017 12:06:04 -0700 Subject: [PATCH 0677/1013] ipv6: grab rt->rt6i_ref before allocating pcpu rt [ Upstream commit a94b9367e044ba672c9f4105eb1516ff6ff4948a ] After rwlock is replaced with rcu and spinlock, ip6_pol_route() will be called with only rcu held. That means rt6 route deletion could happen simultaneously with rt6_make_pcpu_rt(). This could potentially cause memory leak if rt6_release() is called right before rt6_make_pcpu_rt() on the same route. This patch grabs rt->rt6i_ref safely before calling rt6_make_pcpu_rt() to make sure rt6_release() will not get triggered while rt6_make_pcpu_rt() is in progress. And rt6_release() is called after rt6_make_pcpu_rt() is finished. Note: As we are incrementing rt->rt6i_ref in ip6_pol_route(), there is a very slim chance that fib6_purge_rt() will be triggered unnecessarily when deleting a route if ip6_pol_route() running on another thread picks this route as well and tries to make pcpu cache for it. Signed-off-by: Wei Wang Signed-off-by: Martin KaFai Lau Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 58 ++++++++++++++++++++++++------------------------ 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 598efa8cfe25..76b47682f77f 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1055,7 +1055,6 @@ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) { - struct fib6_table *table = rt->rt6i_table; struct rt6_info *pcpu_rt, *prev, **p; pcpu_rt = ip6_rt_pcpu_alloc(rt); @@ -1066,28 +1065,20 @@ static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) return net->ipv6.ip6_null_entry; } - read_lock_bh(&table->tb6_lock); - if (rt->rt6i_pcpu) { - p = this_cpu_ptr(rt->rt6i_pcpu); - prev = cmpxchg(p, NULL, pcpu_rt); - if (prev) { - /* If someone did it before us, return prev instead */ - dst_release_immediate(&pcpu_rt->dst); - pcpu_rt = prev; - } - } else { - /* rt has been removed from the fib6 tree - * before we have a chance to acquire the read_lock. - * In this case, don't brother to create a pcpu rt - * since rt is going away anyway. The next - * dst_check() will trigger a re-lookup. - */ + dst_hold(&pcpu_rt->dst); + p = this_cpu_ptr(rt->rt6i_pcpu); + prev = cmpxchg(p, NULL, pcpu_rt); + if (prev) { + /* If someone did it before us, return prev instead */ + /* release refcnt taken by ip6_rt_pcpu_alloc() */ dst_release_immediate(&pcpu_rt->dst); - pcpu_rt = rt; + /* release refcnt taken by above dst_hold() */ + dst_release_immediate(&pcpu_rt->dst); + dst_hold(&prev->dst); + pcpu_rt = prev; } - dst_hold(&pcpu_rt->dst); + rt6_dst_from_metrics_check(pcpu_rt); - read_unlock_bh(&table->tb6_lock); return pcpu_rt; } @@ -1177,19 +1168,28 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, if (pcpu_rt) { read_unlock_bh(&table->tb6_lock); } else { - /* We have to do the read_unlock first - * because rt6_make_pcpu_route() may trigger - * ip6_dst_gc() which will take the write_lock. - */ - dst_hold(&rt->dst); - read_unlock_bh(&table->tb6_lock); - pcpu_rt = rt6_make_pcpu_route(rt); - dst_release(&rt->dst); + /* atomic_inc_not_zero() is needed when using rcu */ + if (atomic_inc_not_zero(&rt->rt6i_ref)) { + /* We have to do the read_unlock first + * because rt6_make_pcpu_route() may trigger + * ip6_dst_gc() which will take the write_lock. + * + * No dst_hold() on rt is needed because grabbing + * rt->rt6i_ref makes sure rt can't be released. + */ + read_unlock_bh(&table->tb6_lock); + pcpu_rt = rt6_make_pcpu_route(rt); + rt6_release(rt); + } else { + /* rt is already removed from tree */ + read_unlock_bh(&table->tb6_lock); + pcpu_rt = net->ipv6.ip6_null_entry; + dst_hold(&pcpu_rt->dst); + } } trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6); return pcpu_rt; - } } EXPORT_SYMBOL_GPL(ip6_pol_route); From 44ee83c6d6e0b0913f32f3580270daa9c784cc0d Mon Sep 17 00:00:00 2001 From: Andrew Jeffery Date: Fri, 1 Sep 2017 15:08:58 +0930 Subject: [PATCH 0678/1013] leds: pca955x: Don't invert requested value in pca955x_gpio_set_value() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 52ca7d0f7bdad832b291ed979146443533ee79c0 ] The PCA9552 lines can be used either for driving LEDs or as GPIOs. The manual states that for LEDs, the operation is open-drain: The LSn LED select registers determine the source of the LED data. 00 = output is set LOW (LED on) 01 = output is set high-impedance (LED off; default) 10 = output blinks at PWM0 rate 11 = output blinks at PWM1 rate For GPIOs it suggests a pull-up so that the open-case drives the line high: For use as output, connect external pull-up resistor to the pin and size it according to the DC recommended operating characteristics. LED output pin is HIGH when the output is programmed as high-impedance, and LOW when the output is programmed LOW through the ‘LED selector’ register. The output can be pulse-width controlled when PWM0 or PWM1 are used. Now, I have a hardware design that uses the LED controller to control LEDs. However, for $reasons, we're using the leds-gpio driver to drive the them. The reasons are here are a tangent but lead to the discovery of the inversion, which manifested as the LEDs being set to full brightness at boot when we expected them to be off. As we're driving the LEDs through leds-gpio, this means wending our way through the gpiochip abstractions. So with that in mind we need to describe an active-low GPIO configuration to drive the LEDs as though they were GPIOs. The set() gpiochip callback in leds-pca955x does the following: ... if (val) pca955x_led_set(&led->led_cdev, LED_FULL); else pca955x_led_set(&led->led_cdev, LED_OFF); ... Where LED_FULL = 255. pca955x_led_set() in turn does: ... switch (value) { case LED_FULL: ls = pca955x_ledsel(ls, ls_led, PCA955X_LS_LED_ON); break; ... Where PCA955X_LS_LED_ON is defined as: #define PCA955X_LS_LED_ON 0x0 /* Output LOW */ So here we have some type confusion: We've crossed domains from GPIO behaviour to LED behaviour without accounting for possible inversions in the process. Stepping back to leds-gpio for a moment, during probe() we call create_gpio_led(), which eventually executes: if (template->default_state == LEDS_GPIO_DEFSTATE_KEEP) { state = gpiod_get_value_cansleep(led_dat->gpiod); if (state < 0) return state; } else { state = (template->default_state == LEDS_GPIO_DEFSTATE_ON); } ... ret = gpiod_direction_output(led_dat->gpiod, state); In the devicetree the GPIO is annotated as active-low, and gpiod_get_value_cansleep() handles this for us: int gpiod_get_value_cansleep(const struct gpio_desc *desc) { int value; might_sleep_if(extra_checks); VALIDATE_DESC(desc); value = _gpiod_get_raw_value(desc); if (value < 0) return value; if (test_bit(FLAG_ACTIVE_LOW, &desc->flags)) value = !value; return value; } _gpiod_get_raw_value() in turn calls through the get() callback for the gpiochip implementation, so returning to our get() implementation in leds-pca955x we find we extract the raw value from hardware: static int pca955x_gpio_get_value(struct gpio_chip *gc, unsigned int offset) { struct pca955x *pca955x = gpiochip_get_data(gc); struct pca955x_led *led = &pca955x->leds[offset]; u8 reg = pca955x_read_input(pca955x->client, led->led_num / 8); return !!(reg & (1 << (led->led_num % 8))); } This behaviour is not symmetric with that of set(), where the val is inverted by the driver. Closing the loop on the GPIO_ACTIVE_LOW inversions, gpiod_direction_output(), like gpiod_get_value_cansleep(), handles it for us: int gpiod_direction_output(struct gpio_desc *desc, int value) { VALIDATE_DESC(desc); if (test_bit(FLAG_ACTIVE_LOW, &desc->flags)) value = !value; else value = !!value; return _gpiod_direction_output_raw(desc, value); } All-in-all, with a value of 'keep' for default-state property in a leds-gpio child node, the current state of the hardware will in-fact be inverted; precisely the opposite of what was intended. Rework leds-pca955x so that we avoid the incorrect inversion and clarify the semantics with respect to GPIO. Signed-off-by: Andrew Jeffery Reviewed-by: Cédric Le Goater Tested-by: Joel Stanley Tested-by: Matt Spinler Signed-off-by: Jacek Anaszewski Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/leds/leds-pca955x.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/leds/leds-pca955x.c b/drivers/leds/leds-pca955x.c index 905729191d3e..78183f90820e 100644 --- a/drivers/leds/leds-pca955x.c +++ b/drivers/leds/leds-pca955x.c @@ -61,6 +61,10 @@ #define PCA955X_LS_BLINK0 0x2 /* Blink at PWM0 rate */ #define PCA955X_LS_BLINK1 0x3 /* Blink at PWM1 rate */ +#define PCA955X_GPIO_INPUT LED_OFF +#define PCA955X_GPIO_HIGH LED_OFF +#define PCA955X_GPIO_LOW LED_FULL + enum pca955x_type { pca9550, pca9551, @@ -329,9 +333,9 @@ static int pca955x_set_value(struct gpio_chip *gc, unsigned int offset, struct pca955x_led *led = &pca955x->leds[offset]; if (val) - return pca955x_led_set(&led->led_cdev, LED_FULL); - else - return pca955x_led_set(&led->led_cdev, LED_OFF); + return pca955x_led_set(&led->led_cdev, PCA955X_GPIO_HIGH); + + return pca955x_led_set(&led->led_cdev, PCA955X_GPIO_LOW); } static void pca955x_gpio_set_value(struct gpio_chip *gc, unsigned int offset, @@ -355,8 +359,11 @@ static int pca955x_gpio_get_value(struct gpio_chip *gc, unsigned int offset) static int pca955x_gpio_direction_input(struct gpio_chip *gc, unsigned int offset) { - /* To use as input ensure pin is not driven */ - return pca955x_set_value(gc, offset, 0); + struct pca955x *pca955x = gpiochip_get_data(gc); + struct pca955x_led *led = &pca955x->leds[offset]; + + /* To use as input ensure pin is not driven. */ + return pca955x_led_set(&led->led_cdev, PCA955X_GPIO_INPUT); } static int pca955x_gpio_direction_output(struct gpio_chip *gc, From 56ea88ec49042b34f6ba10280c1fddf0703b5e57 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 4 Oct 2017 20:43:35 +0200 Subject: [PATCH 0679/1013] Bluetooth: hci_uart_set_flow_control: Fix NULL deref when using serdev [ Upstream commit 7841d554809b518a22349e7e39b6b63f8a48d0fb ] Fix a NULL pointer deref (hu->tty) when calling hci_uart_set_flow_control on hci_uart-s using serdev. Signed-off-by: Hans de Goede Signed-off-by: Marcel Holtmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/bluetooth/hci_ldisc.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c index 6e2403805784..6aef3bde10d7 100644 --- a/drivers/bluetooth/hci_ldisc.c +++ b/drivers/bluetooth/hci_ldisc.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -298,6 +299,12 @@ void hci_uart_set_flow_control(struct hci_uart *hu, bool enable) unsigned int set = 0; unsigned int clear = 0; + if (hu->serdev) { + serdev_device_set_flow_control(hu->serdev, !enable); + serdev_device_set_rts(hu->serdev, !enable); + return; + } + if (enable) { /* Disable hardware flow control */ ktermios = tty->termios; From da548d5a6f9eb188b4b0b07dd70f60d0b7ca05d3 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 4 Oct 2017 20:43:36 +0200 Subject: [PATCH 0680/1013] Bluetooth: hci_bcm: Fix setting of irq trigger type [ Upstream commit 227630cccdbb8f8a1b24ac26517b75079c9a69c9 ] This commit fixes 2 issues with host-wake irq trigger type handling in hci_bcm: 1) bcm_setup_sleep sets sleep_params.host_wake_active based on bcm_device.irq_polarity, but bcm_request_irq was always requesting IRQF_TRIGGER_RISING as trigger type independent of irq_polarity. This was a problem when the irq is described as a GpioInt rather then an Interrupt in the DSDT as for GpioInt-s the value passed to request_irq is honored. This commit fixes this by requesting the correct trigger type depending on bcm_device.irq_polarity. 2) bcm_device.irq_polarity was used to directly store an ACPI polarity value (ACPI_ACTIVE_*). This is undesirable because hci_bcm is also used with device-tree and checking for something like ACPI_ACTIVE_LOW in a non ACPI specific function like bcm_request_irq feels wrong. This commit fixes this by renaming irq_polarity to irq_active_low and changing its type to a bool. Signed-off-by: Hans de Goede Signed-off-by: Marcel Holtmann Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/bluetooth/hci_bcm.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/drivers/bluetooth/hci_bcm.c b/drivers/bluetooth/hci_bcm.c index e2540113d0da..73d2d88ddc03 100644 --- a/drivers/bluetooth/hci_bcm.c +++ b/drivers/bluetooth/hci_bcm.c @@ -68,7 +68,7 @@ struct bcm_device { u32 init_speed; u32 oper_speed; int irq; - u8 irq_polarity; + bool irq_active_low; #ifdef CONFIG_PM struct hci_uart *hu; @@ -213,7 +213,9 @@ static int bcm_request_irq(struct bcm_data *bcm) } err = devm_request_irq(&bdev->pdev->dev, bdev->irq, bcm_host_wake, - IRQF_TRIGGER_RISING, "host_wake", bdev); + bdev->irq_active_low ? IRQF_TRIGGER_FALLING : + IRQF_TRIGGER_RISING, + "host_wake", bdev); if (err) goto unlock; @@ -253,7 +255,7 @@ static int bcm_setup_sleep(struct hci_uart *hu) struct sk_buff *skb; struct bcm_set_sleep_mode sleep_params = default_sleep_params; - sleep_params.host_wake_active = !bcm->dev->irq_polarity; + sleep_params.host_wake_active = !bcm->dev->irq_active_low; skb = __hci_cmd_sync(hu->hdev, 0xfc27, sizeof(sleep_params), &sleep_params, HCI_INIT_TIMEOUT); @@ -690,10 +692,8 @@ static const struct acpi_gpio_mapping acpi_bcm_int_first_gpios[] = { }; #ifdef CONFIG_ACPI -static u8 acpi_active_low = ACPI_ACTIVE_LOW; - /* IRQ polarity of some chipsets are not defined correctly in ACPI table. */ -static const struct dmi_system_id bcm_wrong_irq_dmi_table[] = { +static const struct dmi_system_id bcm_active_low_irq_dmi_table[] = { { .ident = "Asus T100TA", .matches = { @@ -701,7 +701,6 @@ static const struct dmi_system_id bcm_wrong_irq_dmi_table[] = { "ASUSTeK COMPUTER INC."), DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "T100TA"), }, - .driver_data = &acpi_active_low, }, { .ident = "Asus T100CHI", @@ -710,7 +709,6 @@ static const struct dmi_system_id bcm_wrong_irq_dmi_table[] = { "ASUSTeK COMPUTER INC."), DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "T100CHI"), }, - .driver_data = &acpi_active_low, }, { /* Handle ThinkPad 8 tablets with BCM2E55 chipset ACPI ID */ .ident = "Lenovo ThinkPad 8", @@ -718,7 +716,6 @@ static const struct dmi_system_id bcm_wrong_irq_dmi_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"), DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "ThinkPad 8"), }, - .driver_data = &acpi_active_low, }, { } }; @@ -733,13 +730,13 @@ static int bcm_resource(struct acpi_resource *ares, void *data) switch (ares->type) { case ACPI_RESOURCE_TYPE_EXTENDED_IRQ: irq = &ares->data.extended_irq; - dev->irq_polarity = irq->polarity; + dev->irq_active_low = irq->polarity == ACPI_ACTIVE_LOW; break; case ACPI_RESOURCE_TYPE_GPIO: gpio = &ares->data.gpio; if (gpio->connection_type == ACPI_RESOURCE_GPIO_TYPE_INT) - dev->irq_polarity = gpio->polarity; + dev->irq_active_low = gpio->polarity == ACPI_ACTIVE_LOW; break; case ACPI_RESOURCE_TYPE_SERIAL_BUS: @@ -834,11 +831,11 @@ static int bcm_acpi_probe(struct bcm_device *dev) return ret; acpi_dev_free_resource_list(&resources); - dmi_id = dmi_first_match(bcm_wrong_irq_dmi_table); + dmi_id = dmi_first_match(bcm_active_low_irq_dmi_table); if (dmi_id) { bt_dev_warn(dev, "%s: Overwriting IRQ polarity to active low", dmi_id->ident); - dev->irq_polarity = *(u8 *)dmi_id->driver_data; + dev->irq_active_low = true; } return 0; From cbd6b3694a4adec779cdb9266883a21254d5897c Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Tue, 29 Aug 2017 05:32:31 -0400 Subject: [PATCH 0681/1013] i40e/i40evf: spread CPU affinity hints across online CPUs only [ Upstream commit be664cbefc50977aaefc868ba6a1109ec9b7449d ] Currently, when setting up the IRQ for a q_vector, we set an affinity hint based on the v_idx of that q_vector. Meaning a loop iterates on v_idx, which is an incremental value, and the cpumask is created based on this value. This is a problem in systems with multiple logical CPUs per core (like in simultaneous multithreading (SMT) scenarios). If we disable some logical CPUs, by turning SMT off for example, we will end up with a sparse cpu_online_mask, i.e., only the first CPU in a core is online, and incremental filling in q_vector cpumask might lead to multiple offline CPUs being assigned to q_vectors. Example: if we have a system with 8 cores each one containing 8 logical CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is disabled, only the 1st CPU in each core remains online, so the cpu_online_mask in this case would have only 8 bits set, in a sparse way. In general case, when SMT is off the cpu_online_mask has only C bits set: 0, 1*N, 2*N, ..., C*(N-1) where C == # of cores; N == # of logical CPUs per core. In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set. Instead, we should only assign hints for CPUs which are online. Even better, the kernel already provides a function, cpumask_local_spread() which takes an index and returns a CPU, spreading the interrupts across local NUMA nodes first, and then remote ones if necessary. Since we generally have a 1:1 mapping between vectors and CPUs, there is no real advantage to spreading vectors to local CPUs first. In order to avoid mismatch of the default XPS hints, we'll pass -1 so that it spreads across all CPUs without regard to the node locality. Note that we don't need to change the q_vector->affinity_mask as this is initialized to cpu_possible_mask, until an actual affinity is set and then notified back to us. Signed-off-by: Jacob Keller Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/i40e/i40e_main.c | 16 +++++++++++----- drivers/net/ethernet/intel/i40evf/i40evf_main.c | 9 ++++++--- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index ea20aacd5e1d..b2cde9b16d82 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -2874,14 +2874,15 @@ static void i40e_vsi_free_rx_resources(struct i40e_vsi *vsi) static void i40e_config_xps_tx_ring(struct i40e_ring *ring) { struct i40e_vsi *vsi = ring->vsi; + int cpu; if (!ring->q_vector || !ring->netdev) return; if ((vsi->tc_config.numtc <= 1) && !test_and_set_bit(__I40E_TX_XPS_INIT_DONE, &ring->state)) { - netif_set_xps_queue(ring->netdev, - get_cpu_mask(ring->q_vector->v_idx), + cpu = cpumask_local_spread(ring->q_vector->v_idx, -1); + netif_set_xps_queue(ring->netdev, get_cpu_mask(cpu), ring->queue_index); } @@ -3471,6 +3472,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename) int tx_int_idx = 0; int vector, err; int irq_num; + int cpu; for (vector = 0; vector < q_vectors; vector++) { struct i40e_q_vector *q_vector = vsi->q_vectors[vector]; @@ -3506,10 +3508,14 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename) q_vector->affinity_notify.notify = i40e_irq_affinity_notify; q_vector->affinity_notify.release = i40e_irq_affinity_release; irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify); - /* get_cpu_mask returns a static constant mask with - * a permanent lifetime so it's ok to use here. + /* Spread affinity hints out across online CPUs. + * + * get_cpu_mask returns a static constant mask with + * a permanent lifetime so it's ok to pass to + * irq_set_affinity_hint without making a copy. */ - irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx)); + cpu = cpumask_local_spread(q_vector->v_idx, -1); + irq_set_affinity_hint(irq_num, get_cpu_mask(cpu)); } vsi->irqs_ready = true; diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c b/drivers/net/ethernet/intel/i40evf/i40evf_main.c index 1825d956bb00..1ccad6f30ebf 100644 --- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c +++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c @@ -546,6 +546,7 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename) unsigned int vector, q_vectors; unsigned int rx_int_idx = 0, tx_int_idx = 0; int irq_num, err; + int cpu; i40evf_irq_disable(adapter); /* Decrement for Other and TCP Timer vectors */ @@ -584,10 +585,12 @@ i40evf_request_traffic_irqs(struct i40evf_adapter *adapter, char *basename) q_vector->affinity_notify.release = i40evf_irq_affinity_release; irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify); - /* get_cpu_mask returns a static constant mask with - * a permanent lifetime so it's ok to use here. + /* Spread the IRQ affinity hints across online CPUs. Note that + * get_cpu_mask returns a mask with a permanent lifetime so + * it's safe to use as a hint for irq_set_affinity_hint. */ - irq_set_affinity_hint(irq_num, get_cpu_mask(q_vector->v_idx)); + cpu = cpumask_local_spread(q_vector->v_idx, -1); + irq_set_affinity_hint(irq_num, get_cpu_mask(cpu)); } return 0; From 45c911bb1814980da78a34423305310d5d327b39 Mon Sep 17 00:00:00 2001 From: Gabriele Paoloni Date: Thu, 28 Sep 2017 15:33:05 +0100 Subject: [PATCH 0682/1013] PCI/AER: Report non-fatal errors only to the affected endpoint [ Upstream commit 86acc790717fb60fb51ea3095084e331d8711c74 ] Previously, if an non-fatal error was reported by an endpoint, we called report_error_detected() for the endpoint, every sibling on the bus, and their descendents. If any of them did not implement the .error_detected() method, do_recovery() failed, leaving all these devices unrecovered. For example, the system described in the bugzilla below has two devices: 0000:74:02.0 [19e5:a230] SAS controller, driver has .error_detected() 0000:74:03.0 [19e5:a235] SATA controller, driver lacks .error_detected() When a device such as 74:02.0 reported a non-fatal error, do_recovery() failed because 74:03.0 lacked an .error_detected() method. But per PCIe r3.1, sec 6.2.2.2.2, such an error does not compromise the Link and does not affect 74:03.0: Non-fatal errors are uncorrectable errors which cause a particular transaction to be unreliable but the Link is otherwise fully functional. Isolating Non-fatal from Fatal errors provides Requester/Receiver logic in a device or system management software the opportunity to recover from the error without resetting the components on the Link and disturbing other transactions in progress. Devices not associated with the transaction in error are not impacted by the error. Report non-fatal errors only to the endpoint that reported them. We really want to check for AER_NONFATAL here, but the current code structure doesn't allow that. Looking for pci_channel_io_normal is the best we can do now. Link: https://bugzilla.kernel.org/show_bug.cgi?id=197055 Fixes: 6c2b374d7485 ("PCI-Express AER implemetation: AER core and aerdriver") Signed-off-by: Gabriele Paoloni Signed-off-by: Dongdong Liu [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pcie/aer/aerdrv_core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c index 890efcc574cb..744805232155 100644 --- a/drivers/pci/pcie/aer/aerdrv_core.c +++ b/drivers/pci/pcie/aer/aerdrv_core.c @@ -390,7 +390,14 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev, * If the error is reported by an end point, we think this * error is related to the upstream link of the end point. */ - pci_walk_bus(dev->bus, cb, &result_data); + if (state == pci_channel_io_normal) + /* + * the error is non fatal so the bus is ok, just invoke + * the callback for the function that logged the error. + */ + cb(dev, &result_data); + else + pci_walk_bus(dev->bus, cb, &result_data); } return result_data.result; From 16e1626e54f835cb009de675d1f6b5a0ff9183d9 Mon Sep 17 00:00:00 2001 From: Tom Zanussi Date: Fri, 22 Sep 2017 14:58:17 -0500 Subject: [PATCH 0683/1013] tracing: Exclude 'generic fields' from histograms [ Upstream commit a15f7fc20389a8827d5859907568b201234d4b79 ] There are a small number of 'generic fields' (comm/COMM/cpu/CPU) that are found by trace_find_event_field() but are only meant for filtering. Specifically, they unlike normal fields, they have a size of 0 and thus wreak havoc when used as a histogram key. Exclude these (return -EINVAL) when used as histogram keys. Link: http://lkml.kernel.org/r/956154cbc3e8a4f0633d619b886c97f0f0edf7b4.1506105045.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace_events_hist.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c index 1c21d0e2a145..7eb975a2d0e1 100644 --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -450,7 +450,7 @@ static int create_val_field(struct hist_trigger_data *hist_data, } field = trace_find_event_field(file->event_call, field_name); - if (!field) { + if (!field || !field->size) { ret = -EINVAL; goto out; } @@ -548,7 +548,7 @@ static int create_key_field(struct hist_trigger_data *hist_data, } field = trace_find_event_field(file->event_call, field_name); - if (!field) { + if (!field || !field->size) { ret = -EINVAL; goto out; } From f9e51fb046db69a2e83d7d7422e33f3402d1800b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jean-Fran=C3=A7ois=20T=C3=AAtu?= Date: Fri, 29 Sep 2017 16:19:44 -0400 Subject: [PATCH 0684/1013] ASoC: codecs: msm8916-wcd-analog: fix micbias level MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit 664611e7e02f76fbc5470ef545b2657ed25c292b ] The macro used to set the microphone bias level causes the snd_soc_write() call to overwrite other fields in the CDC_A_MICB_1_VAL register. The macro also does not return the proper level value to use. This fixes this by preserving all bits from the register that are not the level while setting the level. Signed-off-by: Jean-François Têtu Acked-by: Srinivas Kandagatla Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/msm8916-wcd-analog.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/sound/soc/codecs/msm8916-wcd-analog.c b/sound/soc/codecs/msm8916-wcd-analog.c index 549c269acc7d..a42f8ebb9670 100644 --- a/sound/soc/codecs/msm8916-wcd-analog.c +++ b/sound/soc/codecs/msm8916-wcd-analog.c @@ -104,7 +104,7 @@ #define CDC_A_MICB_1_VAL (0xf141) #define MICB_MIN_VAL 1600 #define MICB_STEP_SIZE 50 -#define MICB_VOLTAGE_REGVAL(v) ((v - MICB_MIN_VAL)/MICB_STEP_SIZE) +#define MICB_VOLTAGE_REGVAL(v) (((v - MICB_MIN_VAL)/MICB_STEP_SIZE) << 3) #define MICB_1_VAL_MICB_OUT_VAL_MASK GENMASK(7, 3) #define MICB_1_VAL_MICB_OUT_VAL_V2P70V ((0x16) << 3) #define MICB_1_VAL_MICB_OUT_VAL_V1P80V ((0x4) << 3) @@ -349,8 +349,9 @@ static void pm8916_wcd_analog_micbias_enable(struct snd_soc_codec *codec) | MICB_1_CTL_EXT_PRECHARG_EN_ENABLE); if (wcd->micbias_mv) { - snd_soc_write(codec, CDC_A_MICB_1_VAL, - MICB_VOLTAGE_REGVAL(wcd->micbias_mv)); + snd_soc_update_bits(codec, CDC_A_MICB_1_VAL, + MICB_1_VAL_MICB_OUT_VAL_MASK, + MICB_VOLTAGE_REGVAL(wcd->micbias_mv)); /* * Special headset needs MICBIAS as 2.7V so wait for * 50 msec for the MICBIAS to reach 2.7 volts. From f5fec0590cd8732c810c0b6d9ae9e3a42021f2cb Mon Sep 17 00:00:00 2001 From: Ed Blake Date: Mon, 2 Oct 2017 11:00:33 +0100 Subject: [PATCH 0685/1013] ASoC: img-parallel-out: Add pm_runtime_get/put to set_fmt callback [ Upstream commit c70458890ff15d858bd347fa9f563818bcd6e457 ] Add pm_runtime_get_sync and pm_runtime_put calls to set_fmt callback function. This fixes a bus error during boot when CONFIG_SUSPEND is defined when this function gets called while the device is runtime disabled and device registers are accessed while the clock is disabled. Signed-off-by: Ed Blake Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/soc/img/img-parallel-out.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/soc/img/img-parallel-out.c b/sound/soc/img/img-parallel-out.c index 23b0f0f6ec9c..2fc8a6372206 100644 --- a/sound/soc/img/img-parallel-out.c +++ b/sound/soc/img/img-parallel-out.c @@ -164,9 +164,11 @@ static int img_prl_out_set_fmt(struct snd_soc_dai *dai, unsigned int fmt) return -EINVAL; } + pm_runtime_get_sync(prl->dev); reg = img_prl_out_readl(prl, IMG_PRL_OUT_CTL); reg = (reg & ~IMG_PRL_OUT_CTL_EDGE_MASK) | control_set; img_prl_out_writel(prl, reg, IMG_PRL_OUT_CTL); + pm_runtime_put(prl->dev); return 0; } From 97f41b41c432e5a80c91445d92c2f4b729984d36 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 29 Sep 2017 13:29:40 +1000 Subject: [PATCH 0686/1013] powerpc/xmon: Avoid tripping SMP hardlockup watchdog [ Upstream commit 064996d62a33ffe10264b5af5dca92d54f60f806 ] The SMP hardlockup watchdog cross-checks other CPUs for lockups, which causes xmon headaches because it's assuming interrupts hard disabled means no watchdog troubles. Try to improve that by calling touch_nmi_watchdog() in obvious places where secondaries are spinning. Also annotate these spin loops with spin_begin/end calls. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/xmon/xmon.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index c008083fbc4f..2c8b325591cc 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -530,14 +530,19 @@ static int xmon_core(struct pt_regs *regs, int fromipi) waiting: secondary = 1; + spin_begin(); while (secondary && !xmon_gate) { if (in_xmon == 0) { - if (fromipi) + if (fromipi) { + spin_end(); goto leave; + } secondary = test_and_set_bit(0, &in_xmon); } - barrier(); + spin_cpu_relax(); + touch_nmi_watchdog(); } + spin_end(); if (!secondary && !xmon_gate) { /* we are the first cpu to come in */ @@ -568,21 +573,25 @@ static int xmon_core(struct pt_regs *regs, int fromipi) mb(); xmon_gate = 1; barrier(); + touch_nmi_watchdog(); } cmdloop: while (in_xmon) { if (secondary) { + spin_begin(); if (cpu == xmon_owner) { if (!test_and_set_bit(0, &xmon_taken)) { secondary = 0; + spin_end(); continue; } /* missed it */ while (cpu == xmon_owner) - barrier(); + spin_cpu_relax(); } - barrier(); + spin_cpu_relax(); + touch_nmi_watchdog(); } else { cmd = cmds(regs); if (cmd != 0) { From fa21a13d76a7eed4b1df1a5223752051dbfde948 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 29 Sep 2017 13:29:39 +1000 Subject: [PATCH 0687/1013] powerpc/watchdog: Do not trigger SMP crash from touch_nmi_watchdog [ Upstream commit 80e4d70b06863e0104e5a0dc78aa3710297fbd4b ] In xmon, touch_nmi_watchdog() is not expected to be checking that other CPUs have not touched the watchdog, so the code will just call touch_nmi_watchdog() once before re-enabling hard interrupts. Just update our CPU's state, and ignore apparently stuck SMP threads. Arguably touch_nmi_watchdog should check for SMP lockups, and callers should be fixed, but that's not trivial for the input code of xmon. Signed-off-by: Nicholas Piggin Signed-off-by: Michael Ellerman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kernel/watchdog.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c index 57190f384f63..ce848ff84edd 100644 --- a/arch/powerpc/kernel/watchdog.c +++ b/arch/powerpc/kernel/watchdog.c @@ -276,9 +276,12 @@ void arch_touch_nmi_watchdog(void) { unsigned long ticks = tb_ticks_per_usec * wd_timer_period_ms * 1000; int cpu = smp_processor_id(); + u64 tb = get_tb(); - if (get_tb() - per_cpu(wd_timer_tb, cpu) >= ticks) - watchdog_timer_interrupt(cpu); + if (tb - per_cpu(wd_timer_tb, cpu) >= ticks) { + per_cpu(wd_timer_tb, cpu) = tb; + wd_smp_clear_cpu_pending(cpu, tb); + } } EXPORT_SYMBOL(arch_touch_nmi_watchdog); From 5d1b6695edb773d27f100eafc8cc9a34c58cb07f Mon Sep 17 00:00:00 2001 From: Marcelo Ricardo Leitner Date: Tue, 3 Oct 2017 19:20:08 -0300 Subject: [PATCH 0688/1013] sctp: silence warns on sctp_stream_init allocations [ Upstream commit 1ae2eaaa229bc350b6f38fbf4ab9c873532aecfb ] As SCTP supports up to 65535 streams, that can lead to very large allocations in sctp_stream_init(). As Xin Long noticed, systems with small amounts of memory are more prone to not have enough memory and dump warnings on dmesg initiated by user actions. Thus, silence them. Also, if the reallocation of stream->out is not necessary, skip it and keep the memory we already have. Reported-by: Xin Long Tested-by: Xin Long Signed-off-by: Marcelo Ricardo Leitner Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/sctp/stream.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/sctp/stream.c b/net/sctp/stream.c index fa8371ff05c4..724adf2786a2 100644 --- a/net/sctp/stream.c +++ b/net/sctp/stream.c @@ -40,9 +40,14 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt, { int i; + gfp |= __GFP_NOWARN; + /* Initial stream->out size may be very big, so free it and alloc - * a new one with new outcnt to save memory. + * a new one with new outcnt to save memory if needed. */ + if (outcnt == stream->outcnt) + goto in; + kfree(stream->out); stream->out = kcalloc(outcnt, sizeof(*stream->out), gfp); @@ -53,6 +58,7 @@ int sctp_stream_init(struct sctp_stream *stream, __u16 outcnt, __u16 incnt, for (i = 0; i < stream->outcnt; i++) stream->out[i].state = SCTP_STREAM_OPEN; +in: if (!incnt) return 0; From c01b06d9ac357e1d6d95d7b62be60ecd69071c2b Mon Sep 17 00:00:00 2001 From: Nicolas Dechesne Date: Tue, 3 Oct 2017 11:49:51 +0200 Subject: [PATCH 0689/1013] ASoC: codecs: msm8916-wcd-analog: fix module autoload [ Upstream commit 46d69e141d479585c105a4d5b2337cd2ce6967e5 ] If the driver is built as a module, autoload won't work because the module alias information is not filled. So user-space can't match the registered device with the corresponding module. Export the module alias information using the MODULE_DEVICE_TABLE() macro. Before this patch: $ modinfo snd_soc_msm8916_analog | grep alias $ After this patch: $ modinfo snd_soc_msm8916_analog | grep alias alias: of:N*T*Cqcom,pm8916-wcd-analog-codecC* alias: of:N*T*Cqcom,pm8916-wcd-analog-codec Signed-off-by: Nicolas Dechesne Signed-off-by: Mark Brown Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/msm8916-wcd-analog.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/sound/soc/codecs/msm8916-wcd-analog.c b/sound/soc/codecs/msm8916-wcd-analog.c index a42f8ebb9670..18933bf6473f 100644 --- a/sound/soc/codecs/msm8916-wcd-analog.c +++ b/sound/soc/codecs/msm8916-wcd-analog.c @@ -1242,6 +1242,8 @@ static const struct of_device_id pm8916_wcd_analog_spmi_match_table[] = { { } }; +MODULE_DEVICE_TABLE(of, pm8916_wcd_analog_spmi_match_table); + static struct platform_driver pm8916_wcd_analog_spmi_driver = { .driver = { .name = "qcom,pm8916-wcd-spmi-codec", From 16ddeff35b7b84eefccddae23133f3eced68a8c0 Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Fri, 11 Aug 2017 11:14:58 -0700 Subject: [PATCH 0690/1013] fm10k: fix mis-ordered parameters in declaration for .ndo_set_vf_bw [ Upstream commit 3e256ac5b1ec307e5dd5a4c99fbdbc651446c738 ] We've had support for setting both a minimum and maximum bandwidth via .ndo_set_vf_bw since commit 883a9ccbae56 ("fm10k: Add support for SR-IOV to driver", 2014-09-20). Likely because we do not support minimum rates, the declaration mis-ordered the "unused" parameter, which causes warnings when analyzed with cppcheck. Fix this warning by properly declaring the min_rate and max_rate variables in the declaration and definition (rather than using "unused"). Also rename "rate" to max_rate so as to clarify that we only support setting the maximum rate. Signed-off-by: Jacob Keller Tested-by: Krishneil Singh Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/fm10k/fm10k.h | 4 ++-- drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 9 +++++---- 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h b/drivers/net/ethernet/intel/fm10k/fm10k.h index 689c413b7782..d2f9a2dd76a2 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k.h +++ b/drivers/net/ethernet/intel/fm10k/fm10k.h @@ -526,8 +526,8 @@ s32 fm10k_iov_update_pvid(struct fm10k_intfc *interface, u16 glort, u16 pvid); int fm10k_ndo_set_vf_mac(struct net_device *netdev, int vf_idx, u8 *mac); int fm10k_ndo_set_vf_vlan(struct net_device *netdev, int vf_idx, u16 vid, u8 qos, __be16 vlan_proto); -int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, int rate, - int unused); +int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, + int __always_unused min_rate, int max_rate); int fm10k_ndo_get_vf_config(struct net_device *netdev, int vf_idx, struct ifla_vf_info *ivi); diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c index 5f4dac0d36ef..f919199944a0 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c @@ -482,7 +482,7 @@ int fm10k_ndo_set_vf_vlan(struct net_device *netdev, int vf_idx, u16 vid, } int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, - int __always_unused unused, int rate) + int __always_unused min_rate, int max_rate) { struct fm10k_intfc *interface = netdev_priv(netdev); struct fm10k_iov_data *iov_data = interface->iov_data; @@ -493,14 +493,15 @@ int fm10k_ndo_set_vf_bw(struct net_device *netdev, int vf_idx, return -EINVAL; /* rate limit cannot be less than 10Mbs or greater than link speed */ - if (rate && ((rate < FM10K_VF_TC_MIN) || rate > FM10K_VF_TC_MAX)) + if (max_rate && + (max_rate < FM10K_VF_TC_MIN || max_rate > FM10K_VF_TC_MAX)) return -EINVAL; /* store values */ - iov_data->vf_info[vf_idx].rate = rate; + iov_data->vf_info[vf_idx].rate = max_rate; /* update hardware configuration */ - hw->iov.ops.configure_tc(hw, vf_idx, rate); + hw->iov.ops.configure_tc(hw, vf_idx, max_rate); return 0; } From 096232d99989e34c80bd3a15e343fc3bbdfa42d8 Mon Sep 17 00:00:00 2001 From: Dick Kennedy Date: Fri, 29 Sep 2017 17:34:42 -0700 Subject: [PATCH 0691/1013] scsi: lpfc: Fix secure firmware updates [ Upstream commit 184fc2b9a8bcbda9c14d0a1e7fbecfc028c7702e ] Firmware update fails with: status x17 add_status x56 on the final write If multiple DMA buffers are used for the download, some firmware revs have difficulty with signatures and crcs split across the dma buffer boundaries. Resolve by making all writes be a single 4k page in length. Signed-off-by: Dick Kennedy Signed-off-by: James Smart Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/lpfc/lpfc_hw4.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_hw4.h b/drivers/scsi/lpfc/lpfc_hw4.h index 1db0a38683f4..2b145966c73f 100644 --- a/drivers/scsi/lpfc/lpfc_hw4.h +++ b/drivers/scsi/lpfc/lpfc_hw4.h @@ -3636,7 +3636,7 @@ struct lpfc_mbx_get_port_name { #define MB_CEQ_STATUS_QUEUE_FLUSHING 0x4 #define MB_CQE_STATUS_DMA_FAILED 0x5 -#define LPFC_MBX_WR_CONFIG_MAX_BDE 8 +#define LPFC_MBX_WR_CONFIG_MAX_BDE 1 struct lpfc_mbx_wr_object { struct mbox_header header; union { From 6fe8e4f3e4e937ab762fe9114818c64ae215c882 Mon Sep 17 00:00:00 2001 From: Dick Kennedy Date: Fri, 29 Sep 2017 17:34:32 -0700 Subject: [PATCH 0692/1013] scsi: lpfc: PLOGI failures during NPIV testing [ Upstream commit e8bcf0ae4c0346fdc78ebefe0eefcaa6a6622d38 ] Local Reject/Invalid RPI errors seen during discovery. Temporary RPI cleanup was occurring regardless of SLI rev. It's only necessary on SLI-4. Adjust the test for whether cleanup is necessary. Signed-off-by: Dick Kennedy Signed-off-by: James Smart Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/lpfc/lpfc_hbadisc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_hbadisc.c b/drivers/scsi/lpfc/lpfc_hbadisc.c index 499df9d17339..d9a03beb76a4 100644 --- a/drivers/scsi/lpfc/lpfc_hbadisc.c +++ b/drivers/scsi/lpfc/lpfc_hbadisc.c @@ -4983,7 +4983,8 @@ lpfc_nlp_remove(struct lpfc_vport *vport, struct lpfc_nodelist *ndlp) lpfc_cancel_retry_delay_tmo(vport, ndlp); if ((ndlp->nlp_flag & NLP_DEFER_RM) && !(ndlp->nlp_flag & NLP_REG_LOGIN_SEND) && - !(ndlp->nlp_flag & NLP_RPI_REGISTERED)) { + !(ndlp->nlp_flag & NLP_RPI_REGISTERED) && + phba->sli_rev != LPFC_SLI_REV4) { /* For this case we need to cleanup the default rpi * allocated by the firmware. */ From 8da9104839c86b9f80fb45df60ec0beae9f103b1 Mon Sep 17 00:00:00 2001 From: Dick Kennedy Date: Fri, 29 Sep 2017 17:34:31 -0700 Subject: [PATCH 0693/1013] scsi: lpfc: Fix warning messages when NVME_TARGET_FC not defined [ Upstream commit 2299e4323d2bf6e0728fdc6b9e8e9704978d2dd7 ] Warning messages when NVME_TARGET_FC not defined on ppc builds The lpfc_nvmet_replenish_context() function is only meaningful when NVME target mode enabled. Surround the function body with ifdefs for target mode enablement. Signed-off-by: Dick Kennedy Signed-off-by: James Smart Reported-by: Stephen Rothwell Reviewed-by: Johannes Thumshirn Signed-off-by: Martin K. Petersen Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/scsi/lpfc/lpfc_nvmet.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_nvmet.c b/drivers/scsi/lpfc/lpfc_nvmet.c index 3c5b054a56ac..7ac1a067d780 100644 --- a/drivers/scsi/lpfc/lpfc_nvmet.c +++ b/drivers/scsi/lpfc/lpfc_nvmet.c @@ -1464,6 +1464,7 @@ static struct lpfc_nvmet_ctxbuf * lpfc_nvmet_replenish_context(struct lpfc_hba *phba, struct lpfc_nvmet_ctx_info *current_infop) { +#if (IS_ENABLED(CONFIG_NVME_TARGET_FC)) struct lpfc_nvmet_ctxbuf *ctx_buf = NULL; struct lpfc_nvmet_ctx_info *get_infop; int i; @@ -1511,6 +1512,7 @@ lpfc_nvmet_replenish_context(struct lpfc_hba *phba, get_infop = get_infop->nvmet_ctx_next_cpu; } +#endif /* Nothing found, all contexts for the MRQ are in-flight */ return NULL; } From 4297cf42691eb626ab5f72f416cf7d9ea126a410 Mon Sep 17 00:00:00 2001 From: Alan Brady Date: Tue, 22 Aug 2017 06:57:53 -0400 Subject: [PATCH 0694/1013] i40e: fix client notify of VF reset [ Upstream commit c53d11f669c0e7d0daf46a717b6712ad0b09de99 ] Currently there is a bug in which the PF driver fails to inform clients of a VF reset which then causes clients to leak resources. The bug exists because we were incorrectly checking the I40E_VF_STATE_PRE_ENABLE bit. When a VF is first init we go through a reset to initialize variables and allocate resources but we don't want to inform clients of this first reset since the client isn't fully enabled yet so we set a state bit signifying we're in a "pre-enabled" client state. During the first reset we should be clearing the bit, allowing all following resets to notify the client of the reset when the bit is not set. This patch fixes the issue by negating the 'test_and_clear_bit' check to accurately reflect the behavior we want. Signed-off-by: Alan Brady Tested-by: Andrew Bowers Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index fa25ea512e43..e368b0237a1b 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -1008,8 +1008,8 @@ static void i40e_cleanup_reset_vf(struct i40e_vf *vf) set_bit(I40E_VF_STATE_ACTIVE, &vf->vf_states); clear_bit(I40E_VF_STATE_DISABLED, &vf->vf_states); /* Do not notify the client during VF init */ - if (test_and_clear_bit(I40E_VF_STATE_PRE_ENABLE, - &vf->vf_states)) + if (!test_and_clear_bit(I40E_VF_STATE_PRE_ENABLE, + &vf->vf_states)) i40e_notify_client_of_vf_reset(pf, abs_vf_id); vf->num_vlan = 0; } From b27bbf1f5b9e62b7b0b8ce9a4f5e474c3119a9a9 Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Mon, 2 Oct 2017 12:39:09 -0600 Subject: [PATCH 0695/1013] vfio/pci: Virtualize Maximum Payload Size [ Upstream commit 523184972b282cd9ca17a76f6ca4742394856818 ] With virtual PCI-Express chipsets, we now see userspace/guest drivers trying to match the physical MPS setting to a virtual downstream port. Of course a lone physical device surrounded by virtual interconnects cannot make a correct decision for a proper MPS setting. Instead, let's virtualize the MPS control register so that writes through to hardware are disallowed. Userspace drivers like QEMU assume they can write anything to the device and we'll filter out anything dangerous. Since mismatched MPS can lead to AER and other faults, let's add it to the kernel side rather than relying on userspace virtualization to handle it. Signed-off-by: Alex Williamson Reviewed-by: Eric Auger Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/vfio/pci/vfio_pci_config.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c index 5628fe114347..91335e6de88a 100644 --- a/drivers/vfio/pci/vfio_pci_config.c +++ b/drivers/vfio/pci/vfio_pci_config.c @@ -849,11 +849,13 @@ static int __init init_pci_cap_exp_perm(struct perm_bits *perm) /* * Allow writes to device control fields, except devctl_phantom, - * which could confuse IOMMU, and the ARI bit in devctl2, which + * which could confuse IOMMU, MPS, which can break communication + * with other physical devices, and the ARI bit in devctl2, which * is set at probe time. FLR gets virtualized via our writefn. */ p_setw(perm, PCI_EXP_DEVCTL, - PCI_EXP_DEVCTL_BCR_FLR, ~PCI_EXP_DEVCTL_PHANTOM); + PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_PAYLOAD, + ~PCI_EXP_DEVCTL_PHANTOM); p_setw(perm, PCI_EXP_DEVCTL2, NO_VIRT, ~PCI_EXP_DEVCTL2_ARI); return 0; } From 2b401d9f1d4543e4e1ad23a46726376adb958831 Mon Sep 17 00:00:00 2001 From: Marek Szyprowski Date: Mon, 2 Oct 2017 08:39:35 +0200 Subject: [PATCH 0696/1013] ARM: exynos_defconfig: Enable UAS support for Odroid HC1 board [ Upstream commit a99897f550de96841aecb811455a67ad7a4e39a7 ] Odroid HC1 board has built-in JMicron USB to SATA bridge, which supports UAS protocol. Compile-in support for it (instead of enabling it as module) to make sure that all built-in storage devices are available for rootfs. The bridge itself also supports fallback to standard USB Mass Storage protocol, but USB Mass Storage class doesn't bind to it when UAS is compiled as module and modules are not (yet) available. Signed-off-by: Marek Szyprowski Signed-off-by: Krzysztof Kozlowski Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- arch/arm/configs/exynos_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig index 8c2a2619971b..f1d7834990ec 100644 --- a/arch/arm/configs/exynos_defconfig +++ b/arch/arm/configs/exynos_defconfig @@ -244,7 +244,7 @@ CONFIG_USB_STORAGE_ONETOUCH=m CONFIG_USB_STORAGE_KARMA=m CONFIG_USB_STORAGE_CYPRESS_ATACB=m CONFIG_USB_STORAGE_ENE_UB6250=m -CONFIG_USB_UAS=m +CONFIG_USB_UAS=y CONFIG_USB_DWC3=y CONFIG_USB_DWC2=y CONFIG_USB_HSIC_USB3503=y From fb383223d00ffa445b266c5fd54ed80fd8c57d17 Mon Sep 17 00:00:00 2001 From: Jacob Keller Date: Mon, 2 Oct 2017 07:17:50 -0700 Subject: [PATCH 0697/1013] fm10k: ensure we process SM mbx when processing VF mbx [ Upstream commit 17a91809942ca32c70026d2d5ba3348a2c4fdf8f ] When we process VF mailboxes, the driver is likely going to also queue up messages to the switch manager. This process merely queues up the FIFO, but doesn't actually begin the transmission process. Because we hold the mailbox lock during this VF processing, the PF<->SM mailbox is not getting processed at this time. Ensure that we actually process the PF<->SM mailbox in between each PF<->VF mailbox. This should ensure prompt transmission of the messages queued up after each VF message is received and handled. Signed-off-by: Jacob Keller Tested-by: Krishneil Singh Signed-off-by: Jeff Kirsher Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/fm10k/fm10k_iov.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c index f919199944a0..e72fd52bacfe 100644 --- a/drivers/net/ethernet/intel/fm10k/fm10k_iov.c +++ b/drivers/net/ethernet/intel/fm10k/fm10k_iov.c @@ -126,6 +126,9 @@ s32 fm10k_iov_mbx(struct fm10k_intfc *interface) struct fm10k_mbx_info *mbx = &vf_info->mbx; u16 glort = vf_info->glort; + /* process the SM mailbox first to drain outgoing messages */ + hw->mbx.ops.process(hw, &hw->mbx); + /* verify port mapping is valid, if not reset port */ if (vf_info->vf_flags && !fm10k_glort_valid_pf(hw, glort)) hw->iov.ops.reset_lport(hw, vf_info); From a58a3af86a4e8a2bd1d2d041341efa28818c6e89 Mon Sep 17 00:00:00 2001 From: Mick Tarsel Date: Thu, 28 Sep 2017 13:53:18 -0700 Subject: [PATCH 0698/1013] ibmvnic: Set state UP [ Upstream commit e876a8a7e9dd89dc88c12ca2e81beb478dbe9897 ] State is initially reported as UNKNOWN. Before register call netif_carrier_off(). Once the device is opened, call netif_carrier_on() in order to set the state to UP. Signed-off-by: Mick Tarsel Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/ibm/ibmvnic.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index c66abd476023..3b0db01ead1f 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -927,6 +927,7 @@ static int ibmvnic_open(struct net_device *netdev) } rc = __ibmvnic_open(netdev); + netif_carrier_on(netdev); mutex_unlock(&adapter->reset_lock); return rc; @@ -3899,6 +3900,7 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) if (rc) goto ibmvnic_init_fail; + netif_carrier_off(netdev); rc = register_netdev(netdev); if (rc) { dev_err(&dev->dev, "failed to register netdev rc=%d\n", rc); From 0006d8c76b0cbfc1dfefe41ac93881f1f8c15432 Mon Sep 17 00:00:00 2001 From: Mike Manning Date: Mon, 25 Sep 2017 22:01:36 +0100 Subject: [PATCH 0699/1013] net: ipv6: send NS for DAD when link operationally up [ Upstream commit 1f372c7bfb23286d2bf4ce0423ab488e86b74bb2 ] The NS for DAD are sent on admin up as long as a valid qdisc is found. A race condition exists by which these packets will not egress the interface if the operational state of the lower device is not yet up. The solution is to delay DAD until the link is operationally up according to RFC2863. Rather than only doing this, follow the existing code checks by deferring IPv6 device initialization altogether. The fix allows DAD on devices like tunnels that are controlled by userspace control plane. The fix has no impact on regular deployments, but means that there is no IPv6 connectivity until the port has been opened in the case of port-based network access control, which should be desirable. Signed-off-by: Mike Manning Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv6/addrconf.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 8a1c846d3df9..2ec39404c449 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -303,10 +303,10 @@ static struct ipv6_devconf ipv6_devconf_dflt __read_mostly = { .disable_policy = 0, }; -/* Check if a valid qdisc is available */ -static inline bool addrconf_qdisc_ok(const struct net_device *dev) +/* Check if link is ready: is it up and is a valid qdisc available */ +static inline bool addrconf_link_ready(const struct net_device *dev) { - return !qdisc_tx_is_noop(dev); + return netif_oper_up(dev) && !qdisc_tx_is_noop(dev); } static void addrconf_del_rs_timer(struct inet6_dev *idev) @@ -451,7 +451,7 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) ndev->token = in6addr_any; - if (netif_running(dev) && addrconf_qdisc_ok(dev)) + if (netif_running(dev) && addrconf_link_ready(dev)) ndev->if_flags |= IF_READY; ipv6_mc_init_dev(ndev); @@ -3404,7 +3404,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, /* restore routes for permanent addresses */ addrconf_permanent_addr(dev); - if (!addrconf_qdisc_ok(dev)) { + if (!addrconf_link_ready(dev)) { /* device is not ready yet. */ pr_info("ADDRCONF(NETDEV_UP): %s: link is not ready\n", dev->name); @@ -3419,7 +3419,7 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, run_pending = 1; } } else if (event == NETDEV_CHANGE) { - if (!addrconf_qdisc_ok(dev)) { + if (!addrconf_link_ready(dev)) { /* device is still not ready. */ break; } From 0fbdd907e4b3836b025b8690823c7271f939a938 Mon Sep 17 00:00:00 2001 From: "Wei Hu(Xavier)" Date: Fri, 29 Sep 2017 23:10:12 +0800 Subject: [PATCH 0700/1013] RDMA/hns: Avoid NULL pointer exception [ Upstream commit 5e437b1d7e8d31ff9a4b8e898eb3a6cee309edd9 ] After the loop in hns_roce_v1_mr_free_work_fn function, it is possible that all qps will have been freed (in which case ne will be 0). If that happens, then later in the function when we dereference hr_qp we will get an exception. Check ne is not 0 to make sure we actually have an hr_qp left to work on. This patch fixes the smatch error as below: drivers/infiniband/hw/hns/hns_roce_hw_v1.c:1009 hns_roce_v1_mr_free_work_fn() error: we previously assumed 'hr_qp' could be null Signed-off-by: Wei Hu (Xavier) Signed-off-by: Lijun Ou Signed-off-by: Shaobo Xu Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c index 747efd1ae5a6..8208c30f03c5 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c @@ -1001,6 +1001,11 @@ static void hns_roce_v1_mr_free_work_fn(struct work_struct *work) } } + if (!ne) { + dev_err(dev, "Reseved loop qp is absent!\n"); + goto free_work; + } + do { ret = hns_roce_v1_poll_cq(&mr_free_cq->ib_cq, ne, wc); if (ret < 0) { From ae35e16e0a57da2a7aa8631c8929dd79bac74bde Mon Sep 17 00:00:00 2001 From: Arvind Yadav Date: Sat, 23 Sep 2017 13:25:30 +0530 Subject: [PATCH 0701/1013] staging: greybus: light: Release memory obtained by kasprintf [ Upstream commit 04820da21050b35eed68aa046115d810163ead0c ] Free memory region, if gb_lights_channel_config is not successful. Signed-off-by: Arvind Yadav Reviewed-by: Rui Miguel Silva Signed-off-by: Greg Kroah-Hartman Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/staging/greybus/light.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/staging/greybus/light.c b/drivers/staging/greybus/light.c index 3f4148c92308..0f538b8c3a07 100644 --- a/drivers/staging/greybus/light.c +++ b/drivers/staging/greybus/light.c @@ -925,6 +925,8 @@ static void __gb_lights_led_unregister(struct gb_channel *channel) return; led_classdev_unregister(cdev); + kfree(cdev->name); + cdev->name = NULL; channel->led = NULL; } From 71e51e4d488d56345d72dc3c89e67af015afea22 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Fri, 29 Sep 2017 16:22:54 +0800 Subject: [PATCH 0702/1013] clk: sunxi-ng: sun6i: Rename HDMI DDC clock to avoid name collision [ Upstream commit 7f3ed79188f2f094d0ee366fa858857fb7f511ba ] The HDMI DDC clock found in the CCU is the parent of the actual DDC clock within the HDMI controller. That clock is also named "hdmi-ddc". Rename the one in the CCU to "ddc". This makes more sense than renaming the one in the HDMI controller to something else. Fixes: c6e6c96d8fa6 ("clk: sunxi-ng: Add A31/A31s clocks") Signed-off-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/clk/sunxi-ng/ccu-sun6i-a31.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/clk/sunxi-ng/ccu-sun6i-a31.c b/drivers/clk/sunxi-ng/ccu-sun6i-a31.c index 8af434815fba..241fb13f1c06 100644 --- a/drivers/clk/sunxi-ng/ccu-sun6i-a31.c +++ b/drivers/clk/sunxi-ng/ccu-sun6i-a31.c @@ -608,7 +608,7 @@ static SUNXI_CCU_M_WITH_MUX_GATE(hdmi_clk, "hdmi", lcd_ch1_parents, 0x150, 0, 4, 24, 2, BIT(31), CLK_SET_RATE_PARENT); -static SUNXI_CCU_GATE(hdmi_ddc_clk, "hdmi-ddc", "osc24M", 0x150, BIT(30), 0); +static SUNXI_CCU_GATE(hdmi_ddc_clk, "ddc", "osc24M", 0x150, BIT(30), 0); static SUNXI_CCU_GATE(ps_clk, "ps", "lcd1-ch1", 0x140, BIT(31), 0); From 0aaff15c10132eb36ab16b4785cd207075e5943b Mon Sep 17 00:00:00 2001 From: Hoang Tran Date: Wed, 27 Sep 2017 18:30:58 +0200 Subject: [PATCH 0703/1013] tcp: fix under-evaluated ssthresh in TCP Vegas [ Upstream commit cf5d74b85ef40c202c76d90959db4d850f301b95 ] With the commit 76174004a0f19785 (tcp: do not slow start when cwnd equals ssthresh), the comparison to the reduced cwnd in tcp_vegas_ssthresh() would under-evaluate the ssthresh. Signed-off-by: Hoang Tran Signed-off-by: David S. Miller Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_vegas.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c index 218cfcc77650..ee113ff15fd0 100644 --- a/net/ipv4/tcp_vegas.c +++ b/net/ipv4/tcp_vegas.c @@ -158,7 +158,7 @@ EXPORT_SYMBOL_GPL(tcp_vegas_cwnd_event); static inline u32 tcp_vegas_ssthresh(struct tcp_sock *tp) { - return min(tp->snd_ssthresh, tp->snd_cwnd-1); + return min(tp->snd_ssthresh, tp->snd_cwnd); } static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked) From 051c3df7d6b8d5f1948503637c69fc939db69035 Mon Sep 17 00:00:00 2001 From: Alexandre Belloni Date: Thu, 28 Sep 2017 13:53:27 +0200 Subject: [PATCH 0704/1013] rtc: set the alarm to the next expiring timer [ Upstream commit 74717b28cb32e1ad3c1042cafd76b264c8c0f68d ] If there is any non expired timer in the queue, the RTC alarm is never set. This is an issue when adding a timer that expires before the next non expired timer. Ensure the RTC alarm is set in that case. Fixes: 2b2f5ff00f63 ("rtc: interface: ignore expired timers when enqueuing new timers") Signed-off-by: Alexandre Belloni Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/interface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c index 8cec9a02c0b8..9eb32ead63db 100644 --- a/drivers/rtc/interface.c +++ b/drivers/rtc/interface.c @@ -779,7 +779,7 @@ static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer) } timerqueue_add(&rtc->timerqueue, &timer->node); - if (!next) { + if (!next || ktime_before(timer->node.expires, next->expires)) { struct rtc_wkalrm alarm; int err; alarm.time = rtc_ktime_to_tm(timer->node.expires); From 80879ecb4624d58e10cfe03550056c34483187a5 Mon Sep 17 00:00:00 2001 From: Nicholas Piggin Date: Fri, 1 Sep 2017 14:29:56 +1000 Subject: [PATCH 0705/1013] cpuidle: fix broadcast control when broadcast can not be entered [ Upstream commit f187851b9b4a76952b1158b86434563dd2031103 ] When failing to enter broadcast timer mode for an idle state that requires it, a new state is selected that does not require broadcast, but the broadcast variable remains set. This causes tick_broadcast_exit to be called despite not having entered broadcast mode. This causes the WARN_ON_ONCE(!irqs_disabled()) to trigger in some cases. It does not appear to cause problems for code today, but seems to violate the interface so should be fixed. Signed-off-by: Nicholas Piggin Reviewed-by: Thomas Gleixner Signed-off-by: Rafael J. Wysocki Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/cpuidle/cpuidle.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c index 484cc8909d5c..ed4df58a855e 100644 --- a/drivers/cpuidle/cpuidle.c +++ b/drivers/cpuidle/cpuidle.c @@ -208,6 +208,7 @@ int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, return -EBUSY; } target_state = &drv->states[index]; + broadcast = false; } /* Take note of the planned idle state. */ From 0d74c05ca7ef083cbc676aacfd147ff60de54c95 Mon Sep 17 00:00:00 2001 From: Eric Anholt Date: Tue, 15 Aug 2017 16:47:19 -0700 Subject: [PATCH 0706/1013] drm/vc4: Avoid using vrefresh==0 mode in DSI htotal math. [ Upstream commit af2eca53206c59ce9308a4f5f46c4a104a179b6b ] The incoming mode might have a missing vrefresh field if it came from drmModeSetCrtc(), which the kernel is supposed to calculate using drm_mode_vrefresh(). We could either use that or the adjusted_mode's original vrefresh value. However, we can maintain a more exact vrefresh value (not just the integer approximation), by scaling by the ratio of our clocks. v2: Use math suggested by Andrzej Hajda instead. v3: Simplify math now that adjusted_mode->clock isn't padded. v4: Drop some parens. Signed-off-by: Eric Anholt Link: https://patchwork.freedesktop.org/patch/msgid/20170815234722.20700-2-eric@anholt.net Reviewed-by: Andrzej Hajda Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/vc4/vc4_dsi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vc4/vc4_dsi.c b/drivers/gpu/drm/vc4/vc4_dsi.c index d1e0dc908048..04796d7d0fdb 100644 --- a/drivers/gpu/drm/vc4/vc4_dsi.c +++ b/drivers/gpu/drm/vc4/vc4_dsi.c @@ -866,7 +866,8 @@ static bool vc4_dsi_encoder_mode_fixup(struct drm_encoder *encoder, adjusted_mode->clock = pixel_clock_hz / 1000 + 1; /* Given the new pixel clock, adjust HFP to keep vrefresh the same. */ - adjusted_mode->htotal = pixel_clock_hz / (mode->vrefresh * mode->vtotal); + adjusted_mode->htotal = adjusted_mode->clock * mode->htotal / + mode->clock; adjusted_mode->hsync_end += adjusted_mode->htotal - mode->htotal; adjusted_mode->hsync_start += adjusted_mode->htotal - mode->htotal; From ecffae11228f96c045eb38fa60bdf17dc677b94b Mon Sep 17 00:00:00 2001 From: Scott Franco Date: Tue, 26 Sep 2017 06:44:13 -0700 Subject: [PATCH 0707/1013] IB/opa_vnic: Properly clear Mac Table Digest [ Upstream commit 4bbdfe25600c1909c26747d0b5c39fd0e409bb87 ] Clear the MAC table digest when the MAC table is freed. Reviewed-by: Niranjana Vishwanathapura Signed-off-by: Scott Franco Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c index afa938bd26d6..a72278e9cd27 100644 --- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c +++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_encap.c @@ -139,6 +139,7 @@ void opa_vnic_release_mac_tbl(struct opa_vnic_adapter *adapter) rcu_assign_pointer(adapter->mactbl, NULL); synchronize_rcu(); opa_vnic_free_mac_tbl(mactbl); + adapter->info.vport.mac_tbl_digest = 0; mutex_unlock(&adapter->mactbl_lock); } From 0da9db57c0423f281db8ae2fb1a67e88d3fecd34 Mon Sep 17 00:00:00 2001 From: Niranjana Vishwanathapura Date: Tue, 26 Sep 2017 06:44:07 -0700 Subject: [PATCH 0708/1013] IB/opa_vnic: Properly return the total MACs in UC MAC list [ Upstream commit b77eb45e0d9c324245d165656ab3b38b6f386436 ] Do not include EM specified MAC address in total MACs of the UC MAC list. Reviewed-by: Sudeep Dutt Signed-off-by: Niranjana Vishwanathapura Signed-off-by: Dennis Dalessandro Signed-off-by: Doug Ledford Signed-off-by: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c index c2733964379c..9655cc3aa3a0 100644 --- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c +++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema_iface.c @@ -348,7 +348,7 @@ void opa_vnic_query_mcast_macs(struct opa_vnic_adapter *adapter, void opa_vnic_query_ucast_macs(struct opa_vnic_adapter *adapter, struct opa_veswport_iface_macs *macs) { - u16 start_idx, tot_macs, num_macs, idx = 0, count = 0; + u16 start_idx, tot_macs, num_macs, idx = 0, count = 0, em_macs = 0; struct netdev_hw_addr *ha; start_idx = be16_to_cpu(macs->start_idx); @@ -359,8 +359,10 @@ void opa_vnic_query_ucast_macs(struct opa_vnic_adapter *adapter, /* Do not include EM specified MAC address */ if (!memcmp(adapter->info.vport.base_mac_addr, ha->addr, - ARRAY_SIZE(adapter->info.vport.base_mac_addr))) + ARRAY_SIZE(adapter->info.vport.base_mac_addr))) { + em_macs++; continue; + } if (start_idx > idx++) continue; @@ -383,7 +385,7 @@ void opa_vnic_query_ucast_macs(struct opa_vnic_adapter *adapter, } tot_macs = netdev_hw_addr_list_count(&adapter->netdev->dev_addrs) + - netdev_uc_count(adapter->netdev); + netdev_uc_count(adapter->netdev) - em_macs; macs->tot_macs_in_lst = cpu_to_be16(tot_macs); macs->num_macs_in_msg = cpu_to_be16(count); macs->gen_count = cpu_to_be16(adapter->info.vport.uc_macs_gen_count); From 7254834c43bd32842e96fd856c9c9793a29e6fe2 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Thu, 19 Oct 2017 19:05:43 +0200 Subject: [PATCH 0709/1013] thermal/drivers/hisi: Fix missing interrupt enablement commit c176b10b025acee4dc8f2ab1cd64eb73b5ccef53 upstream. The interrupt for the temperature threshold is not enabled at the end of the probe function, enable it after the setup is complete. On the other side, the irq_enabled is not correctly set as we are checking if the interrupt is masked where 'yes' means irq_enabled=false. irq_get_irqchip_state(data->irq, IRQCHIP_STATE_MASKED, &data->irq_enabled); As we are always enabling the interrupt, it is pointless to check if the interrupt is masked or not, just set irq_enabled to 'true'. Signed-off-by: Daniel Lezcano Reviewed-by: Leo Yan Tested-by: Leo Yan Signed-off-by: Eduardo Valentin Signed-off-by: Kevin Wangtao Signed-off-by: Greg Kroah-Hartman --- drivers/thermal/hisi_thermal.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index bd3572c41585..8381696241d6 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -345,8 +345,7 @@ static int hisi_thermal_probe(struct platform_device *pdev) } hisi_thermal_enable_bind_irq_sensor(data); - irq_get_irqchip_state(data->irq, IRQCHIP_STATE_MASKED, - &data->irq_enabled); + data->irq_enabled = true; for (i = 0; i < HISI_MAX_SENSORS; ++i) { ret = hisi_thermal_register_sensor(pdev, data, @@ -358,6 +357,8 @@ static int hisi_thermal_probe(struct platform_device *pdev) hisi_thermal_toggle_sensor(&data->sensors[i], true); } + enable_irq(data->irq); + return 0; } From cf826c57785384341c3f0cf2cfbfe86c02eb6b9b Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Thu, 19 Oct 2017 19:05:45 +0200 Subject: [PATCH 0710/1013] thermal/drivers/hisi: Fix kernel panic on alarm interrupt commit 2cb4de785c40d4a2132cfc13e63828f5a28c3351 upstream. The threaded interrupt for the alarm interrupt is requested before the temperature controller is setup. This one can fire an interrupt immediately leading to a kernel panic as the sensor data is not initialized. In order to prevent that, move the threaded irq after the Tsensor is setup. Signed-off-by: Daniel Lezcano Reviewed-by: Leo Yan Tested-by: Leo Yan Signed-off-by: Eduardo Valentin Signed-off-by: Kevin Wangtao Signed-off-by: Greg Kroah-Hartman --- drivers/thermal/hisi_thermal.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index 8381696241d6..26bb8e2a482f 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -317,15 +317,6 @@ static int hisi_thermal_probe(struct platform_device *pdev) if (data->irq < 0) return data->irq; - ret = devm_request_threaded_irq(&pdev->dev, data->irq, - hisi_thermal_alarm_irq, - hisi_thermal_alarm_irq_thread, - 0, "hisi_thermal", data); - if (ret < 0) { - dev_err(&pdev->dev, "failed to request alarm irq: %d\n", ret); - return ret; - } - platform_set_drvdata(pdev, data); data->clk = devm_clk_get(&pdev->dev, "thermal_clk"); @@ -357,6 +348,15 @@ static int hisi_thermal_probe(struct platform_device *pdev) hisi_thermal_toggle_sensor(&data->sensors[i], true); } + ret = devm_request_threaded_irq(&pdev->dev, data->irq, + hisi_thermal_alarm_irq, + hisi_thermal_alarm_irq_thread, + 0, "hisi_thermal", data); + if (ret < 0) { + dev_err(&pdev->dev, "failed to request alarm irq: %d\n", ret); + return ret; + } + enable_irq(data->irq); return 0; From 02c17c0f825c2d22de05c18e97982796b634eff3 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Thu, 19 Oct 2017 19:05:46 +0200 Subject: [PATCH 0711/1013] thermal/drivers/hisi: Simplify the temperature/step computation commit 48880b979cdc9ef5a70af020f42b8ba1e51dbd34 upstream. The step and the base temperature are fixed values, we can simplify the computation by converting the base temperature to milli celsius and use a pre-computed step value. That saves us a lot of mult + div for nothing at runtime. Take also the opportunity to change the function names to be consistent with the rest of the code. Signed-off-by: Daniel Lezcano Reviewed-by: Leo Yan Tested-by: Leo Yan Signed-off-by: Eduardo Valentin Signed-off-by: Kevin Wangtao Signed-off-by: Greg Kroah-Hartman --- drivers/thermal/hisi_thermal.c | 41 +++++++++++++++++++++++----------- 1 file changed, 28 insertions(+), 13 deletions(-) diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index 26bb8e2a482f..ea77aac97ad3 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -35,8 +35,9 @@ #define TEMP0_RST_MSK (0x1C) #define TEMP0_VALUE (0x28) -#define HISI_TEMP_BASE (-60) +#define HISI_TEMP_BASE (-60000) #define HISI_TEMP_RESET (100000) +#define HISI_TEMP_STEP (784) #define HISI_MAX_SENSORS 4 @@ -61,19 +62,32 @@ struct hisi_thermal_data { void __iomem *regs; }; -/* in millicelsius */ -static inline int _step_to_temp(int step) +/* + * The temperature computation on the tsensor is as follow: + * Unit: millidegree Celsius + * Step: 255/200 (0.7843) + * Temperature base: -60°C + * + * The register is programmed in temperature steps, every step is 784 + * millidegree and begins at -60 000 m°C + * + * The temperature from the steps: + * + * Temp = TempBase + (steps x 784) + * + * and the steps from the temperature: + * + * steps = (Temp - TempBase) / 784 + * + */ +static inline int hisi_thermal_step_to_temp(int step) { - /* - * Every step equals (1 * 200) / 255 celsius, and finally - * need convert to millicelsius. - */ - return (HISI_TEMP_BASE * 1000 + (step * 200000 / 255)); + return HISI_TEMP_BASE + (step * HISI_TEMP_STEP); } -static inline long _temp_to_step(long temp) +static inline long hisi_thermal_temp_to_step(long temp) { - return ((temp - HISI_TEMP_BASE * 1000) * 255) / 200000; + return (temp - HISI_TEMP_BASE) / HISI_TEMP_STEP; } static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, @@ -99,7 +113,7 @@ static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, usleep_range(3000, 5000); val = readl(data->regs + TEMP0_VALUE); - val = _step_to_temp(val); + val = hisi_thermal_step_to_temp(val); mutex_unlock(&data->thermal_lock); @@ -126,10 +140,11 @@ static void hisi_thermal_enable_bind_irq_sensor writel((sensor->id << 12), data->regs + TEMP0_CFG); /* enable for interrupt */ - writel(_temp_to_step(sensor->thres_temp) | 0x0FFFFFF00, + writel(hisi_thermal_temp_to_step(sensor->thres_temp) | 0x0FFFFFF00, data->regs + TEMP0_TH); - writel(_temp_to_step(HISI_TEMP_RESET), data->regs + TEMP0_RST_TH); + writel(hisi_thermal_temp_to_step(HISI_TEMP_RESET), + data->regs + TEMP0_RST_TH); /* enable module */ writel(0x1, data->regs + TEMP0_RST_MSK); From 5431aef93678a1f91e8b3ab41076f2cd1be34353 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Thu, 19 Oct 2017 19:05:47 +0200 Subject: [PATCH 0712/1013] thermal/drivers/hisi: Fix multiple alarm interrupts firing commit db2b0332608c8e648ea1e44727d36ad37cdb56cb upstream. The DT specifies a threshold of 65000, we setup the register with a value in the temperature resolution for the controller, 64656. When we reach 64656, the interrupt fires, the interrupt is disabled. Then the irq thread runs and calls thermal_zone_device_update() which will call in turn hisi_thermal_get_temp(). The function will look if the temperature decreased, assuming it was more than 65000, but that is not the case because the current temperature is 64656 (because of the rounding when setting the threshold). This condition being true, we re-enable the interrupt which fires immediately after exiting the irq thread. That happens again and again until the temperature goes to more than 65000. Potentially, there is here an interrupt storm if the temperature stabilizes at this temperature. A very unlikely case but possible. In any case, it does not make sense to handle dozens of alarm interrupt for nothing. Fix this by rounding the threshold value to the controller resolution so the check against the threshold is consistent with the one set in the controller. Signed-off-by: Daniel Lezcano Reviewed-by: Leo Yan Tested-by: Leo Yan Signed-off-by: Eduardo Valentin Signed-off-by: Kevin Wangtao Signed-off-by: Greg Kroah-Hartman --- drivers/thermal/hisi_thermal.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c index ea77aac97ad3..6d8906d65476 100644 --- a/drivers/thermal/hisi_thermal.c +++ b/drivers/thermal/hisi_thermal.c @@ -90,6 +90,12 @@ static inline long hisi_thermal_temp_to_step(long temp) return (temp - HISI_TEMP_BASE) / HISI_TEMP_STEP; } +static inline long hisi_thermal_round_temp(int temp) +{ + return hisi_thermal_step_to_temp( + hisi_thermal_temp_to_step(temp)); +} + static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data, struct hisi_thermal_sensor *sensor) { @@ -245,7 +251,7 @@ static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev) sensor = &data->sensors[data->irq_bind_sensor]; dev_crit(&data->pdev->dev, "THERMAL ALARM: T > %d\n", - sensor->thres_temp / 1000); + sensor->thres_temp); mutex_unlock(&data->thermal_lock); for (i = 0; i < HISI_MAX_SENSORS; i++) { @@ -284,7 +290,7 @@ static int hisi_thermal_register_sensor(struct platform_device *pdev, for (i = 0; i < of_thermal_get_ntrips(sensor->tzd); i++) { if (trip[i].type == THERMAL_TRIP_PASSIVE) { - sensor->thres_temp = trip[i].temperature; + sensor->thres_temp = hisi_thermal_round_temp(trip[i].temperature); break; } } From 0237a0a456563d461814111db15ae26666ce730e Mon Sep 17 00:00:00 2001 From: Peter Hutterer Date: Mon, 4 Dec 2017 10:26:17 +1000 Subject: [PATCH 0713/1013] platform/x86: asus-wireless: send an EV_SYN/SYN_REPORT between state changes commit bff5bf9db1c9453ffd0a78abed3e2d040c092fd9 upstream. Sending the switch state change twice within the same frame is invalid evdev protocol and only works if the client handles keys immediately as well. Processing events immediately is incorrect, it forces a fake order of events that does not exist on the device. Recent versions of libinput changed to only process the device state and SYN_REPORT time, so now the key event is lost. https://bugs.freedesktop.org/show_bug.cgi?id=104041 Signed-off-by: Peter Hutterer Signed-off-by: Darren Hart (VMware) Signed-off-by: Greg Kroah-Hartman --- drivers/platform/x86/asus-wireless.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/platform/x86/asus-wireless.c b/drivers/platform/x86/asus-wireless.c index f3796164329e..d4aeac3477f5 100644 --- a/drivers/platform/x86/asus-wireless.c +++ b/drivers/platform/x86/asus-wireless.c @@ -118,6 +118,7 @@ static void asus_wireless_notify(struct acpi_device *adev, u32 event) return; } input_report_key(data->idev, KEY_RFKILL, 1); + input_sync(data->idev); input_report_key(data->idev, KEY_RFKILL, 0); input_sync(data->idev); } From 7d7545295e714751fa236c6284167342b066d825 Mon Sep 17 00:00:00 2001 From: "Kirill A. Shutemov" Date: Tue, 7 Nov 2017 11:33:37 +0300 Subject: [PATCH 0714/1013] mm/sparsemem: Fix ARM64 boot crash when CONFIG_SPARSEMEM_EXTREME=y commit 629a359bdb0e0652a8227b4ff3125431995fec6e upstream. Since commit: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") we allocate the mem_section array dynamically in sparse_memory_present_with_active_regions(), but some architectures, like arm64, don't call the routine to initialize sparsemem. Let's move the initialization into memory_present() it should cover all architectures. Reported-and-tested-by: Sudeep Holla Tested-by: Bjorn Andersson Signed-off-by: Kirill A. Shutemov Acked-by: Will Deacon Cc: Andrew Morton Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-mm@kvack.org Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Link: http://lkml.kernel.org/r/20171107083337.89952-1-kirill.shutemov@linux.intel.com Signed-off-by: Ingo Molnar Cc: Dan Rue Cc: Naresh Kamboju Signed-off-by: Greg Kroah-Hartman --- mm/page_alloc.c | 10 ---------- mm/sparse.c | 10 ++++++++++ 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 02cf30bd01cd..d51c2087c498 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5651,16 +5651,6 @@ void __init sparse_memory_present_with_active_regions(int nid) unsigned long start_pfn, end_pfn; int i, this_nid; -#ifdef CONFIG_SPARSEMEM_EXTREME - if (!mem_section) { - unsigned long size, align; - - size = sizeof(struct mem_section) * NR_SECTION_ROOTS; - align = 1 << (INTERNODE_CACHE_SHIFT); - mem_section = memblock_virt_alloc(size, align); - } -#endif - for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, &this_nid) memory_present(this_nid, start_pfn, end_pfn); } diff --git a/mm/sparse.c b/mm/sparse.c index 044138852baf..60805abf98af 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -207,6 +207,16 @@ void __init memory_present(int nid, unsigned long start, unsigned long end) { unsigned long pfn; +#ifdef CONFIG_SPARSEMEM_EXTREME + if (unlikely(!mem_section)) { + unsigned long size, align; + + size = sizeof(struct mem_section) * NR_SECTION_ROOTS; + align = 1 << (INTERNODE_CACHE_SHIFT); + mem_section = memblock_virt_alloc(size, align); + } +#endif + start &= PAGE_SECTION_MASK; mminit_validate_memmodel_limits(&start, &end); for (pfn = start; pfn < end; pfn += PAGES_PER_SECTION) { From 2b3ea8ceb2bb71e9e58527661261dba127137d9b Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:22:59 +0100 Subject: [PATCH 0715/1013] bpf: fix branch pruning logic From: Alexei Starovoitov [ Upstream commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 ] when the verifier detects that register contains a runtime constant and it's compared with another constant it will prune exploration of the branch that is guaranteed not to be taken at runtime. This is all correct, but malicious program may be constructed in such a way that it always has a constant comparison and the other branch is never taken under any conditions. In this case such path through the program will not be explored by the verifier. It won't be taken at run-time either, but since all instructions are JITed the malicious program may cause JITs to complain about using reserved fields, etc. To fix the issue we have to track the instructions explored by the verifier and sanitize instructions that are dead at run time with NOPs. We cannot reject such dead code, since llvm generates it for valid C code, since it doesn't do as much data flow analysis as the verifier does. Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)") Signed-off-by: Alexei Starovoitov Acked-by: Daniel Borkmann Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- include/linux/bpf_verifier.h | 2 +- kernel/bpf/verifier.c | 27 +++++++++++++++++++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index b8d200f60a40..5d6de3b57758 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -110,7 +110,7 @@ struct bpf_insn_aux_data { struct bpf_map *map_ptr; /* pointer for call insn into lookup_elem */ }; int ctx_field_size; /* the ctx field size for load insn, maybe 0 */ - int converted_op_size; /* the valid value width after perceived conversion */ + bool seen; /* this insn was processed by the verifier */ }; #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */ diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c48ca2a34b5e..201d6b965811 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3665,6 +3665,7 @@ static int do_check(struct bpf_verifier_env *env) if (err) return err; + env->insn_aux_data[insn_idx].seen = true; if (class == BPF_ALU || class == BPF_ALU64) { err = check_alu_op(env, insn); if (err) @@ -3855,6 +3856,7 @@ static int do_check(struct bpf_verifier_env *env) return err; insn_idx++; + env->insn_aux_data[insn_idx].seen = true; } else { verbose("invalid BPF_LD mode\n"); return -EINVAL; @@ -4035,6 +4037,7 @@ static int adjust_insn_aux_data(struct bpf_verifier_env *env, u32 prog_len, u32 off, u32 cnt) { struct bpf_insn_aux_data *new_data, *old_data = env->insn_aux_data; + int i; if (cnt == 1) return 0; @@ -4044,6 +4047,8 @@ static int adjust_insn_aux_data(struct bpf_verifier_env *env, u32 prog_len, memcpy(new_data, old_data, sizeof(struct bpf_insn_aux_data) * off); memcpy(new_data + off + cnt - 1, old_data + off, sizeof(struct bpf_insn_aux_data) * (prog_len - off - cnt + 1)); + for (i = off; i < off + cnt - 1; i++) + new_data[i].seen = true; env->insn_aux_data = new_data; vfree(old_data); return 0; @@ -4062,6 +4067,25 @@ static struct bpf_prog *bpf_patch_insn_data(struct bpf_verifier_env *env, u32 of return new_prog; } +/* The verifier does more data flow analysis than llvm and will not explore + * branches that are dead at run time. Malicious programs can have dead code + * too. Therefore replace all dead at-run-time code with nops. + */ +static void sanitize_dead_code(struct bpf_verifier_env *env) +{ + struct bpf_insn_aux_data *aux_data = env->insn_aux_data; + struct bpf_insn nop = BPF_MOV64_REG(BPF_REG_0, BPF_REG_0); + struct bpf_insn *insn = env->prog->insnsi; + const int insn_cnt = env->prog->len; + int i; + + for (i = 0; i < insn_cnt; i++) { + if (aux_data[i].seen) + continue; + memcpy(insn + i, &nop, sizeof(nop)); + } +} + /* convert load instructions that access fields of 'struct __sk_buff' * into sequence of instructions that access fields of 'struct sk_buff' */ @@ -4378,6 +4402,9 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr *attr) while (pop_stack(env, NULL) >= 0); free_states(env); + if (ret == 0) + sanitize_dead_code(env); + if (ret == 0) /* program is valid, convert *(u32*)(ctx + off) accesses */ ret = convert_ctx_accesses(env); From a23244e8845f510ce3ba8b77b32cdd3d3d8ae853 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:00 +0100 Subject: [PATCH 0716/1013] bpf: fix corruption on concurrent perf_event_output calls [ Upstream commit 283ca526a9bd75aed7350220d7b1f8027d99c3fd ] When tracing and networking programs are both attached in the system and both use event-output helpers that eventually call into perf_event_output(), then we could end up in a situation where the tracing attached program runs in user context while a cls_bpf program is triggered on that same CPU out of softirq context. Since both rely on the same per-cpu perf_sample_data, we could potentially corrupt it. This can only ever happen in a combination of the two types; all tracing programs use a bpf_prog_active counter to bail out in case a program is already running on that CPU out of a different context. XDP and cls_bpf programs by themselves don't have this issue as they run in the same context only. Therefore, split both perf_sample_data so they cannot be accessed from each other. Fixes: 20b9d7ac4852 ("bpf: avoid excessive stack usage for perf_sample_data") Reported-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Tested-by: Song Liu Acked-by: Alexei Starovoitov Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman --- kernel/trace/bpf_trace.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index dc498b605d5d..6350f64d5aa4 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -293,14 +293,13 @@ static const struct bpf_func_proto bpf_perf_event_read_proto = { .arg2_type = ARG_ANYTHING, }; -static DEFINE_PER_CPU(struct perf_sample_data, bpf_sd); +static DEFINE_PER_CPU(struct perf_sample_data, bpf_trace_sd); static __always_inline u64 __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, - u64 flags, struct perf_raw_record *raw) + u64 flags, struct perf_sample_data *sd) { struct bpf_array *array = container_of(map, struct bpf_array, map); - struct perf_sample_data *sd = this_cpu_ptr(&bpf_sd); unsigned int cpu = smp_processor_id(); u64 index = flags & BPF_F_INDEX_MASK; struct bpf_event_entry *ee; @@ -323,8 +322,6 @@ __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, if (unlikely(event->oncpu != cpu)) return -EOPNOTSUPP; - perf_sample_data_init(sd, 0, 0); - sd->raw = raw; perf_event_output(event, sd, regs); return 0; } @@ -332,6 +329,7 @@ __bpf_perf_event_output(struct pt_regs *regs, struct bpf_map *map, BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, u64, flags, void *, data, u64, size) { + struct perf_sample_data *sd = this_cpu_ptr(&bpf_trace_sd); struct perf_raw_record raw = { .frag = { .size = size, @@ -342,7 +340,10 @@ BPF_CALL_5(bpf_perf_event_output, struct pt_regs *, regs, struct bpf_map *, map, if (unlikely(flags & ~(BPF_F_INDEX_MASK))) return -EINVAL; - return __bpf_perf_event_output(regs, map, flags, &raw); + perf_sample_data_init(sd, 0, 0); + sd->raw = &raw; + + return __bpf_perf_event_output(regs, map, flags, sd); } static const struct bpf_func_proto bpf_perf_event_output_proto = { @@ -357,10 +358,12 @@ static const struct bpf_func_proto bpf_perf_event_output_proto = { }; static DEFINE_PER_CPU(struct pt_regs, bpf_pt_regs); +static DEFINE_PER_CPU(struct perf_sample_data, bpf_misc_sd); u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy) { + struct perf_sample_data *sd = this_cpu_ptr(&bpf_misc_sd); struct pt_regs *regs = this_cpu_ptr(&bpf_pt_regs); struct perf_raw_frag frag = { .copy = ctx_copy, @@ -378,8 +381,10 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, }; perf_fetch_caller_regs(regs); + perf_sample_data_init(sd, 0, 0); + sd->raw = &raw; - return __bpf_perf_event_output(regs, map, flags, &raw); + return __bpf_perf_event_output(regs, map, flags, sd); } BPF_CALL_0(bpf_get_current_task) From 83ab155d144922cb7421fb975e500901185e7644 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:01 +0100 Subject: [PATCH 0717/1013] bpf, s390x: do not reload skb pointers in non-skb context [ Upstream commit 6d59b7dbf72ed20d0138e2f9b75ca3d4a9d4faca ] The assumption of unconditionally reloading skb pointers on BPF helper calls where bpf_helper_changes_pkt_data() holds true is wrong. There can be different contexts where the BPF helper would enforce a reload such as in case of XDP. Here, we do have a struct xdp_buff instead of struct sk_buff as context, thus this will access garbage. JITs only ever need to deal with cached skb pointer reload when ld_abs/ind was seen, therefore guard the reload behind SEEN_SKB only. Tested on s390x. Fixes: 9db7f2b81880 ("s390/bpf: recache skb->data/hlen for skb_vlan_push/pop") Signed-off-by: Daniel Borkmann Acked-by: Alexei Starovoitov Cc: Michael Holzheu Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman --- arch/s390/net/bpf_jit_comp.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c index b15cd2f0320f..33e2785f6842 100644 --- a/arch/s390/net/bpf_jit_comp.c +++ b/arch/s390/net/bpf_jit_comp.c @@ -55,8 +55,7 @@ struct bpf_jit { #define SEEN_LITERAL 8 /* code uses literals */ #define SEEN_FUNC 16 /* calls C functions */ #define SEEN_TAIL_CALL 32 /* code uses tail calls */ -#define SEEN_SKB_CHANGE 64 /* code changes skb data */ -#define SEEN_REG_AX 128 /* code uses constant blinding */ +#define SEEN_REG_AX 64 /* code uses constant blinding */ #define SEEN_STACK (SEEN_FUNC | SEEN_MEM | SEEN_SKB) /* @@ -448,12 +447,12 @@ static void bpf_jit_prologue(struct bpf_jit *jit) EMIT6_DISP_LH(0xe3000000, 0x0024, REG_W1, REG_0, REG_15, 152); } - if (jit->seen & SEEN_SKB) + if (jit->seen & SEEN_SKB) { emit_load_skb_data_hlen(jit); - if (jit->seen & SEEN_SKB_CHANGE) /* stg %b1,ST_OFF_SKBP(%r0,%r15) */ EMIT6_DISP_LH(0xe3000000, 0x0024, BPF_REG_1, REG_0, REG_15, STK_OFF_SKBP); + } } /* @@ -983,8 +982,8 @@ static noinline int bpf_jit_insn(struct bpf_jit *jit, struct bpf_prog *fp, int i EMIT2(0x0d00, REG_14, REG_W1); /* lgr %b0,%r2: load return value into %b0 */ EMIT4(0xb9040000, BPF_REG_0, REG_2); - if (bpf_helper_changes_pkt_data((void *)func)) { - jit->seen |= SEEN_SKB_CHANGE; + if ((jit->seen & SEEN_SKB) && + bpf_helper_changes_pkt_data((void *)func)) { /* lg %b1,ST_OFF_SKBP(%r15) */ EMIT6_DISP_LH(0xe3000000, 0x0004, BPF_REG_1, REG_0, REG_15, STK_OFF_SKBP); From 8a681dfd8fb253cd3cac03a59e2867f9baad7934 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:02 +0100 Subject: [PATCH 0718/1013] bpf, ppc64: do not reload skb pointers in non-skb context [ Upstream commit 87338c8e2cbb317b5f757e6172f94e2e3799cd20 ] The assumption of unconditionally reloading skb pointers on BPF helper calls where bpf_helper_changes_pkt_data() holds true is wrong. There can be different contexts where the helper would enforce a reload such as in case of XDP. Here, we do have a struct xdp_buff instead of struct sk_buff as context, thus this will access garbage. JITs only ever need to deal with cached skb pointer reload when ld_abs/ind was seen, therefore guard the reload behind SEEN_SKB. Fixes: 156d0e290e96 ("powerpc/ebpf/jit: Implement JIT compiler for extended BPF") Signed-off-by: Daniel Borkmann Reviewed-by: Naveen N. Rao Acked-by: Alexei Starovoitov Tested-by: Sandipan Das Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/net/bpf_jit_comp64.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c index a66e64b0b251..5d115bd32539 100644 --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -762,7 +762,8 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, func = (u8 *) __bpf_call_base + imm; /* Save skb pointer if we need to re-cache skb data */ - if (bpf_helper_changes_pkt_data(func)) + if ((ctx->seen & SEEN_SKB) && + bpf_helper_changes_pkt_data(func)) PPC_BPF_STL(3, 1, bpf_jit_stack_local(ctx)); bpf_jit_emit_func_call(image, ctx, (u64)func); @@ -771,7 +772,8 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, PPC_MR(b2p[BPF_REG_0], 3); /* refresh skb cache */ - if (bpf_helper_changes_pkt_data(func)) { + if ((ctx->seen & SEEN_SKB) && + bpf_helper_changes_pkt_data(func)) { /* reload skb pointer to r3 */ PPC_BPF_LL(3, 1, bpf_jit_stack_local(ctx)); bpf_jit_emit_skb_loads(image, ctx); From 82a9d62f603f0cb5549c4ca554f06e70510b7296 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:03 +0100 Subject: [PATCH 0719/1013] bpf, sparc: fix usage of wrong reg for load_skb_regs after call [ Upstream commit 07aee94394547721ac168cbf4e1c09c14a5fe671 ] When LD_ABS/IND is used in the program, and we have a BPF helper call that changes packet data (bpf_helper_changes_pkt_data() returns true), then in case of sparc JIT, we try to reload cached skb data from bpf2sparc[BPF_REG_6]. However, there is no such guarantee or assumption that skb sits in R6 at this point, all helpers changing skb data only have a guarantee that skb sits in R1. Therefore, store BPF R1 in L7 temporarily and after procedure call use L7 to reload cached skb data. skb sitting in R6 is only true at the time when LD_ABS/IND is executed. Fixes: 7a12b5031c6b ("sparc64: Add eBPF JIT.") Signed-off-by: Daniel Borkmann Acked-by: David S. Miller Acked-by: Alexei Starovoitov Signed-off-by: Alexei Starovoitov Signed-off-by: Greg Kroah-Hartman --- arch/sparc/net/bpf_jit_comp_64.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c index 5765e7e711f7..ff5f9cb3039a 100644 --- a/arch/sparc/net/bpf_jit_comp_64.c +++ b/arch/sparc/net/bpf_jit_comp_64.c @@ -1245,14 +1245,16 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) u8 *func = ((u8 *)__bpf_call_base) + imm; ctx->saw_call = true; + if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func)) + emit_reg_move(bpf2sparc[BPF_REG_1], L7, ctx); emit_call((u32 *)func, ctx); emit_nop(ctx); emit_reg_move(O0, bpf2sparc[BPF_REG_0], ctx); - if (bpf_helper_changes_pkt_data(func) && ctx->saw_ld_abs_ind) - load_skb_regs(ctx, bpf2sparc[BPF_REG_6]); + if (ctx->saw_ld_abs_ind && bpf_helper_changes_pkt_data(func)) + load_skb_regs(ctx, L7); break; } From 4d54f7df5131d67f653f674003ec5f52c9818b53 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:04 +0100 Subject: [PATCH 0720/1013] bpf/verifier: fix bounds calculation on BPF_RSH From: Edward Cree [ Upstream commit 4374f256ce8182019353c0c639bb8d0695b4c941 ] Incorrect signed bounds were being computed. If the old upper signed bound was positive and the old lower signed bound was negative, this could cause the new upper signed bound to be too low, leading to security issues. Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values") Reported-by: Jann Horn Signed-off-by: Edward Cree Acked-by: Alexei Starovoitov [jannh@google.com: changed description to reflect bug impact] Signed-off-by: Jann Horn Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 30 ++++++++++++++++-------------- 1 file changed, 16 insertions(+), 14 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 201d6b965811..c6ec04f94525 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2157,20 +2157,22 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, mark_reg_unknown(regs, insn->dst_reg); break; } - /* BPF_RSH is an unsigned shift, so make the appropriate casts */ - if (dst_reg->smin_value < 0) { - if (umin_val) { - /* Sign bit will be cleared */ - dst_reg->smin_value = 0; - } else { - /* Lost sign bit information */ - dst_reg->smin_value = S64_MIN; - dst_reg->smax_value = S64_MAX; - } - } else { - dst_reg->smin_value = - (u64)(dst_reg->smin_value) >> umax_val; - } + /* BPF_RSH is an unsigned shift. If the value in dst_reg might + * be negative, then either: + * 1) src_reg might be zero, so the sign bit of the result is + * unknown, so we lose our signed bounds + * 2) it's known negative, thus the unsigned bounds capture the + * signed bounds + * 3) the signed bounds cross zero, so they tell us nothing + * about the result + * If the value in dst_reg is known nonnegative, then again the + * unsigned bounts capture the signed bounds. + * Thus, in all cases it suffices to blow away our signed bounds + * and rely on inferring new ones from the unsigned bounds and + * var_off of the result. + */ + dst_reg->smin_value = S64_MIN; + dst_reg->smax_value = S64_MAX; if (src_known) dst_reg->var_off = tnum_rshift(dst_reg->var_off, umin_val); From 6e12ea4fb45ca86cdd7425276b6993455fee947a Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:05 +0100 Subject: [PATCH 0721/1013] bpf: fix incorrect sign extension in check_alu_op() From: Jann Horn [ Upstream commit 95a762e2c8c942780948091f8f2a4f32fce1ac6f ] Distinguish between BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit) and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit); only perform sign extension in the first case. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16995 for this issue. v3: - add CVE number (Ben Hutchings) Fixes: 484611357c19 ("bpf: allow access into map value arrays") Signed-off-by: Jann Horn Acked-by: Edward Cree Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index c6ec04f94525..2811ced3c2a9 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -2374,7 +2374,13 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn) * remember the value we stored into this reg */ regs[insn->dst_reg].type = SCALAR_VALUE; - __mark_reg_known(regs + insn->dst_reg, insn->imm); + if (BPF_CLASS(insn->code) == BPF_ALU64) { + __mark_reg_known(regs + insn->dst_reg, + insn->imm); + } else { + __mark_reg_known(regs + insn->dst_reg, + (u32)insn->imm); + } } } else if (opcode > BPF_END) { From bf5ee24e87e39548bf30d4e18e479e61a5a98336 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:06 +0100 Subject: [PATCH 0722/1013] bpf: fix incorrect tracking of register size truncation From: Jann Horn [ Upstream commit 0c17d1d2c61936401f4702e1846e2c19b200f958 ] Properly handle register truncation to a smaller size. The old code first mirrors the clearing of the high 32 bits in the bitwise tristate representation, which is correct. But then, it computes the new arithmetic bounds as the intersection between the old arithmetic bounds and the bounds resulting from the bitwise tristate representation. Therefore, when coerce_reg_to_32() is called on a number with bounds [0xffff'fff8, 0x1'0000'0007], the verifier computes [0xffff'fff8, 0xffff'ffff] as bounds of the truncated number. This is incorrect: The truncated number could also be in the range [0, 7], and no meaningful arithmetic bounds can be computed in that case apart from the obvious [0, 0xffff'ffff]. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16996 for this issue. v2: - flip the mask during arithmetic bounds calculation (Ben Hutchings) v3: - add CVE number (Ben Hutchings) Fixes: b03c9f9fdc37 ("bpf/verifier: track signed and unsigned min/max values") Signed-off-by: Jann Horn Acked-by: Edward Cree Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 44 ++++++++++++++++++++++++++----------------- 1 file changed, 27 insertions(+), 17 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 2811ced3c2a9..1e2afb264c31 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1068,6 +1068,29 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, return check_generic_ptr_alignment(reg, pointer_desc, off, size, strict); } +/* truncate register to smaller size (in bytes) + * must be called with size < BPF_REG_SIZE + */ +static void coerce_reg_to_size(struct bpf_reg_state *reg, int size) +{ + u64 mask; + + /* clear high bits in bit representation */ + reg->var_off = tnum_cast(reg->var_off, size); + + /* fix arithmetic bounds */ + mask = ((u64)1 << (size * 8)) - 1; + if ((reg->umin_value & ~mask) == (reg->umax_value & ~mask)) { + reg->umin_value &= mask; + reg->umax_value &= mask; + } else { + reg->umin_value = 0; + reg->umax_value = mask; + } + reg->smin_value = reg->umin_value; + reg->smax_value = reg->umax_value; +} + /* check whether memory at (regno + off) is accessible for t = (read | write) * if t==write, value_regno is a register which value is stored into memory * if t==read, value_regno is a register which will receive the value from memory @@ -1200,9 +1223,7 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ && state->regs[value_regno].type == SCALAR_VALUE) { /* b/h/w load zero-extends, mark upper bits as known 0 */ - state->regs[value_regno].var_off = tnum_cast( - state->regs[value_regno].var_off, size); - __update_reg_bounds(&state->regs[value_regno]); + coerce_reg_to_size(&state->regs[value_regno], size); } return err; } @@ -1742,14 +1763,6 @@ static int check_call(struct bpf_verifier_env *env, int func_id, int insn_idx) return 0; } -static void coerce_reg_to_32(struct bpf_reg_state *reg) -{ - /* clear high 32 bits */ - reg->var_off = tnum_cast(reg->var_off, 4); - /* Update bounds */ - __update_reg_bounds(reg); -} - static bool signed_add_overflows(s64 a, s64 b) { /* Do the add in u64, where overflow is well-defined */ @@ -1984,8 +1997,8 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, if (BPF_CLASS(insn->code) != BPF_ALU64) { /* 32-bit ALU ops are (32,32)->64 */ - coerce_reg_to_32(dst_reg); - coerce_reg_to_32(&src_reg); + coerce_reg_to_size(dst_reg, 4); + coerce_reg_to_size(&src_reg, 4); } smin_val = src_reg.smin_value; smax_val = src_reg.smax_value; @@ -2364,10 +2377,7 @@ static int check_alu_op(struct bpf_verifier_env *env, struct bpf_insn *insn) return -EACCES; } mark_reg_unknown(regs, insn->dst_reg); - /* high 32 bits are known zero. */ - regs[insn->dst_reg].var_off = tnum_cast( - regs[insn->dst_reg].var_off, 4); - __update_reg_bounds(®s[insn->dst_reg]); + coerce_reg_to_size(®s[insn->dst_reg], 4); } } else { /* case: R = imm From 6c8e098d0324412d4ae9e06c7e611a96b87faf80 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:07 +0100 Subject: [PATCH 0723/1013] bpf: fix 32-bit ALU op verification From: Jann Horn [ Upstream commit 468f6eafa6c44cb2c5d8aad35e12f06c240a812a ] 32-bit ALU ops operate on 32-bit values and have 32-bit outputs. Adjust the verifier accordingly. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 1e2afb264c31..0c7e4c8a2b8a 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1984,6 +1984,10 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, return 0; } +/* WARNING: This function does calculations on 64-bit values, but the actual + * execution may occur on 32-bit values. Therefore, things like bitshifts + * need extra checks in the 32-bit case. + */ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, struct bpf_insn *insn, struct bpf_reg_state *dst_reg, @@ -1994,12 +1998,8 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, bool src_known, dst_known; s64 smin_val, smax_val; u64 umin_val, umax_val; + u64 insn_bitness = (BPF_CLASS(insn->code) == BPF_ALU64) ? 64 : 32; - if (BPF_CLASS(insn->code) != BPF_ALU64) { - /* 32-bit ALU ops are (32,32)->64 */ - coerce_reg_to_size(dst_reg, 4); - coerce_reg_to_size(&src_reg, 4); - } smin_val = src_reg.smin_value; smax_val = src_reg.smax_value; umin_val = src_reg.umin_value; @@ -2135,9 +2135,9 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, __update_reg_bounds(dst_reg); break; case BPF_LSH: - if (umax_val > 63) { - /* Shifts greater than 63 are undefined. This includes - * shifts by a negative number. + if (umax_val >= insn_bitness) { + /* Shifts greater than 31 or 63 are undefined. + * This includes shifts by a negative number. */ mark_reg_unknown(regs, insn->dst_reg); break; @@ -2163,9 +2163,9 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, __update_reg_bounds(dst_reg); break; case BPF_RSH: - if (umax_val > 63) { - /* Shifts greater than 63 are undefined. This includes - * shifts by a negative number. + if (umax_val >= insn_bitness) { + /* Shifts greater than 31 or 63 are undefined. + * This includes shifts by a negative number. */ mark_reg_unknown(regs, insn->dst_reg); break; @@ -2201,6 +2201,12 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, break; } + if (BPF_CLASS(insn->code) != BPF_ALU64) { + /* 32-bit ALU ops are (32,32)->32 */ + coerce_reg_to_size(dst_reg, 4); + coerce_reg_to_size(&src_reg, 4); + } + __reg_deduce_bounds(dst_reg); __reg_bound_offset(dst_reg); return 0; From 2120fca0ecfb4552d27608d409ebd3403ce02ce4 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:08 +0100 Subject: [PATCH 0724/1013] bpf: fix missing error return in check_stack_boundary() From: Jann Horn Prevent indirect stack accesses at non-constant addresses, which would permit reading and corrupting spilled pointers. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 0c7e4c8a2b8a..8aa98a0591d6 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1303,6 +1303,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno, tnum_strn(tn_buf, sizeof(tn_buf), regs[regno].var_off); verbose("invalid variable stack read R%d var_off=%s\n", regno, tn_buf); + return -EACCES; } off = regs[regno].off + regs[regno].var_off.value; if (off >= 0 || off < -MAX_BPF_STACK || off + access_size > 0 || From c90268f7cbee0781331b96d1423d0f28a6183889 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:09 +0100 Subject: [PATCH 0725/1013] bpf: force strict alignment checks for stack pointers From: Jann Horn [ Upstream commit a5ec6ae161d72f01411169a938fa5f8baea16e8f ] Force strict alignment checks for stack pointers because the tracking of stack spills relies on it; unaligned stack accesses can lead to corruption of spilled registers, which is exploitable. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8aa98a0591d6..8c353554628e 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1061,6 +1061,11 @@ static int check_ptr_alignment(struct bpf_verifier_env *env, break; case PTR_TO_STACK: pointer_desc = "stack "; + /* The stack spill tracking logic in check_stack_write() + * and check_stack_read() relies on stack accesses being + * aligned. + */ + strict = true; break; default: break; From cb56cc1b292b8b3f787fad89f1208f8e98d12c7d Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:10 +0100 Subject: [PATCH 0726/1013] bpf: don't prune branches when a scalar is replaced with a pointer From: Jann Horn [ Upstream commit 179d1c5602997fef5a940c6ddcf31212cbfebd14 ] This could be made safe by passing through a reference to env and checking for env->allow_ptr_leaks, but it would only work one way and is probably not worth the hassle - not doing it will not directly lead to program rejection. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Signed-off-by: Jann Horn Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- kernel/bpf/verifier.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 8c353554628e..5a30eda17c4f 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3337,15 +3337,14 @@ static bool regsafe(struct bpf_reg_state *rold, struct bpf_reg_state *rcur, return range_within(rold, rcur) && tnum_in(rold->var_off, rcur->var_off); } else { - /* if we knew anything about the old value, we're not - * equal, because we can't know anything about the - * scalar value of the pointer in the new value. + /* We're trying to use a pointer in place of a scalar. + * Even if the scalar was unbounded, this could lead to + * pointer leaks because scalars are allowed to leak + * while pointers are not. We could make this safe in + * special cases if root is calling us, but it's + * probably not worth the hassle. */ - return rold->umin_value == 0 && - rold->umax_value == U64_MAX && - rold->smin_value == S64_MIN && - rold->smax_value == S64_MAX && - tnum_is_unknown(rold->var_off); + return false; } case PTR_TO_MAP_VALUE: /* If the new min/max/var_off satisfy the old ones and From de31796c052e47c99b1bb342bc70aa826733e862 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:11 +0100 Subject: [PATCH 0727/1013] bpf: fix integer overflows From: Alexei Starovoitov [ Upstream commit bb7f0f989ca7de1153bd128a40a71709e339fa03 ] There were various issues related to the limited size of integers used in the verifier: - `off + size` overflow in __check_map_access() - `off + reg->off` overflow in check_mem_access() - `off + reg->var_off.value` overflow or 32-bit truncation of `reg->var_off.value` in check_mem_access() - 32-bit truncation in check_stack_boundary() Make sure that any integer math cannot overflow by not allowing pointer math with large values. Also reduce the scope of "scalar op scalar" tracking. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Reported-by: Jann Horn Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- include/linux/bpf_verifier.h | 4 +-- kernel/bpf/verifier.c | 48 ++++++++++++++++++++++++++++++++++++ 2 files changed, 50 insertions(+), 2 deletions(-) diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 5d6de3b57758..73bec75b74c8 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -15,11 +15,11 @@ * In practice this is far bigger than any realistic pointer offset; this limit * ensures that umax_value + (int)off + (int)size cannot overflow a u64. */ -#define BPF_MAX_VAR_OFF (1ULL << 31) +#define BPF_MAX_VAR_OFF (1 << 29) /* Maximum variable size permitted for ARG_CONST_SIZE[_OR_ZERO]. This ensures * that converting umax_value to int cannot overflow. */ -#define BPF_MAX_VAR_SIZ INT_MAX +#define BPF_MAX_VAR_SIZ (1 << 29) /* Liveness marks, used for registers and spilled-regs (in stack slots). * Read marks propagate upwards until they find a write mark; they record that diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 5a30eda17c4f..c5ff809e86d0 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1789,6 +1789,41 @@ static bool signed_sub_overflows(s64 a, s64 b) return res > a; } +static bool check_reg_sane_offset(struct bpf_verifier_env *env, + const struct bpf_reg_state *reg, + enum bpf_reg_type type) +{ + bool known = tnum_is_const(reg->var_off); + s64 val = reg->var_off.value; + s64 smin = reg->smin_value; + + if (known && (val >= BPF_MAX_VAR_OFF || val <= -BPF_MAX_VAR_OFF)) { + verbose("math between %s pointer and %lld is not allowed\n", + reg_type_str[type], val); + return false; + } + + if (reg->off >= BPF_MAX_VAR_OFF || reg->off <= -BPF_MAX_VAR_OFF) { + verbose("%s pointer offset %d is not allowed\n", + reg_type_str[type], reg->off); + return false; + } + + if (smin == S64_MIN) { + verbose("math between %s pointer and register with unbounded min value is not allowed\n", + reg_type_str[type]); + return false; + } + + if (smin >= BPF_MAX_VAR_OFF || smin <= -BPF_MAX_VAR_OFF) { + verbose("value %lld makes %s pointer be out of bounds\n", + smin, reg_type_str[type]); + return false; + } + + return true; +} + /* Handles arithmetic on a pointer and a scalar: computes new min/max and var_off. * Caller should also handle BPF_MOV case separately. * If we return -EACCES, caller may want to try again treating pointer as a @@ -1854,6 +1889,10 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, dst_reg->type = ptr_reg->type; dst_reg->id = ptr_reg->id; + if (!check_reg_sane_offset(env, off_reg, ptr_reg->type) || + !check_reg_sane_offset(env, ptr_reg, ptr_reg->type)) + return -EINVAL; + switch (opcode) { case BPF_ADD: /* We can take a fixed offset as long as it doesn't overflow @@ -1984,6 +2023,9 @@ static int adjust_ptr_min_max_vals(struct bpf_verifier_env *env, return -EACCES; } + if (!check_reg_sane_offset(env, dst_reg, ptr_reg->type)) + return -EINVAL; + __update_reg_bounds(dst_reg); __reg_deduce_bounds(dst_reg); __reg_bound_offset(dst_reg); @@ -2013,6 +2055,12 @@ static int adjust_scalar_min_max_vals(struct bpf_verifier_env *env, src_known = tnum_is_const(src_reg.var_off); dst_known = tnum_is_const(dst_reg->var_off); + if (!src_known && + opcode != BPF_ADD && opcode != BPF_SUB && opcode != BPF_AND) { + __mark_reg_unknown(dst_reg); + return 0; + } + switch (opcode) { case BPF_ADD: if (signed_add_overflows(dst_reg->smin_value, smin_val) || From d605778b613a535d258c1e6a931bc2153aa63185 Mon Sep 17 00:00:00 2001 From: Daniel Borkmann Date: Fri, 22 Dec 2017 16:23:12 +0100 Subject: [PATCH 0728/1013] selftests/bpf: add tests for recent bugfixes From: Jann Horn [ Upstream commit 2255f8d520b0a318fc6d387d0940854b2f522a7f ] These tests should cover the following cases: - MOV with both zero-extended and sign-extended immediates - implicit truncation of register contents via ALU32/MOV32 - implicit 32-bit truncation of ALU32 output - oversized register source operand for ALU32 shift - right-shift of a number that could be positive or negative - map access where adding the operation size to the offset causes signed 32-bit overflow - direct stack access at a ~4GiB offset Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests that should fail independent of what flags userspace passes. Signed-off-by: Jann Horn Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Signed-off-by: Greg Kroah-Hartman --- tools/testing/selftests/bpf/test_verifier.c | 549 +++++++++++++++++++- 1 file changed, 533 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c index 64ae21f64489..7a2d221c4702 100644 --- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -606,7 +606,6 @@ static struct bpf_test tests[] = { }, .errstr = "misaligned stack access", .result = REJECT, - .flags = F_LOAD_WITH_STRICT_ALIGNMENT, }, { "invalid map_fd for function call", @@ -1797,7 +1796,6 @@ static struct bpf_test tests[] = { }, .result = REJECT, .errstr = "misaligned stack access off (0x0; 0x0)+-8+2 size 8", - .flags = F_LOAD_WITH_STRICT_ALIGNMENT, }, { "PTR_TO_STACK store/load - bad alignment on reg", @@ -1810,7 +1808,6 @@ static struct bpf_test tests[] = { }, .result = REJECT, .errstr = "misaligned stack access off (0x0; 0x0)+-10+8 size 8", - .flags = F_LOAD_WITH_STRICT_ALIGNMENT, }, { "PTR_TO_STACK store/load - out of bounds low", @@ -6115,7 +6112,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6139,7 +6136,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6165,7 +6162,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R8 invalid mem access 'inv'", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6190,7 +6187,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R8 invalid mem access 'inv'", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6238,7 +6235,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6309,7 +6306,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6360,7 +6357,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6387,7 +6384,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6413,7 +6410,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6442,7 +6439,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6472,7 +6469,7 @@ static struct bpf_test tests[] = { BPF_JMP_IMM(BPF_JA, 0, 0, -7), }, .fixup_map1 = { 4 }, - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, }, { @@ -6500,8 +6497,7 @@ static struct bpf_test tests[] = { BPF_EXIT_INSN(), }, .fixup_map1 = { 3 }, - .errstr_unpriv = "R0 pointer comparison prohibited", - .errstr = "R0 min value is negative", + .errstr = "unbounded min value", .result = REJECT, .result_unpriv = REJECT, }, @@ -6556,6 +6552,462 @@ static struct bpf_test tests[] = { .errstr = "R0 min value is negative, either use unsigned index or do a if (index >=0) check.", .result = REJECT, }, + { + "bounds check based on zero-extended MOV", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), + /* r2 = 0x0000'0000'ffff'ffff */ + BPF_MOV32_IMM(BPF_REG_2, 0xffffffff), + /* r2 = 0 */ + BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 32), + /* no-op */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2), + /* access at offset 0 */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .result = ACCEPT + }, + { + "bounds check based on sign-extended MOV. test1", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), + /* r2 = 0xffff'ffff'ffff'ffff */ + BPF_MOV64_IMM(BPF_REG_2, 0xffffffff), + /* r2 = 0xffff'ffff */ + BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 32), + /* r0 = */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2), + /* access to OOB pointer */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "map_value pointer and 4294967295", + .result = REJECT + }, + { + "bounds check based on sign-extended MOV. test2", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), + /* r2 = 0xffff'ffff'ffff'ffff */ + BPF_MOV64_IMM(BPF_REG_2, 0xffffffff), + /* r2 = 0xfff'ffff */ + BPF_ALU64_IMM(BPF_RSH, BPF_REG_2, 36), + /* r0 = */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_2), + /* access to OOB pointer */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "R0 min value is outside of the array range", + .result = REJECT + }, + { + "bounds check based on reg_off + var_off + insn_off. test1", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, + offsetof(struct __sk_buff, mark)), + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), + BPF_ALU64_IMM(BPF_AND, BPF_REG_6, 1), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, (1 << 29) - 1), + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, (1 << 29) - 1), + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 3), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 4 }, + .errstr = "value_size=8 off=1073741825", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { + "bounds check based on reg_off + var_off + insn_off. test2", + .insns = { + BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_1, + offsetof(struct __sk_buff, mark)), + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 4), + BPF_ALU64_IMM(BPF_AND, BPF_REG_6, 1), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, (1 << 30) - 1), + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_6), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, (1 << 29) - 1), + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 3), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 4 }, + .errstr = "value 1073741823", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_SCHED_CLS, + }, + { + "bounds check after truncation of non-boundary-crossing range", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9), + /* r1 = [0x00, 0xff] */ + BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + BPF_MOV64_IMM(BPF_REG_2, 1), + /* r2 = 0x10'0000'0000 */ + BPF_ALU64_IMM(BPF_LSH, BPF_REG_2, 36), + /* r1 = [0x10'0000'0000, 0x10'0000'00ff] */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_1, BPF_REG_2), + /* r1 = [0x10'7fff'ffff, 0x10'8000'00fe] */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff), + /* r1 = [0x00, 0xff] */ + BPF_ALU32_IMM(BPF_SUB, BPF_REG_1, 0x7fffffff), + /* r1 = 0 */ + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), + /* no-op */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + /* access at offset 0 */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .result = ACCEPT + }, + { + "bounds check after truncation of boundary-crossing range (1)", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9), + /* r1 = [0x00, 0xff] */ + BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1), + /* r1 = [0xffff'ff80, 0x1'0000'007f] */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1), + /* r1 = [0xffff'ff80, 0xffff'ffff] or + * [0x0000'0000, 0x0000'007f] + */ + BPF_ALU32_IMM(BPF_ADD, BPF_REG_1, 0), + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1), + /* r1 = [0x00, 0xff] or + * [0xffff'ffff'0000'0080, 0xffff'ffff'ffff'ffff] + */ + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1), + /* r1 = 0 or + * [0x00ff'ffff'ff00'0000, 0x00ff'ffff'ffff'ffff] + */ + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), + /* no-op or OOB pointer computation */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + /* potentially OOB access */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + /* not actually fully unbounded, but the bound is very high */ + .errstr = "R0 unbounded memory access", + .result = REJECT + }, + { + "bounds check after truncation of boundary-crossing range (2)", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 9), + /* r1 = [0x00, 0xff] */ + BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1), + /* r1 = [0xffff'ff80, 0x1'0000'007f] */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0xffffff80 >> 1), + /* r1 = [0xffff'ff80, 0xffff'ffff] or + * [0x0000'0000, 0x0000'007f] + * difference to previous test: truncation via MOV32 + * instead of ALU32. + */ + BPF_MOV32_REG(BPF_REG_1, BPF_REG_1), + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1), + /* r1 = [0x00, 0xff] or + * [0xffff'ffff'0000'0080, 0xffff'ffff'ffff'ffff] + */ + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 0xffffff80 >> 1), + /* r1 = 0 or + * [0x00ff'ffff'ff00'0000, 0x00ff'ffff'ffff'ffff] + */ + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), + /* no-op or OOB pointer computation */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + /* potentially OOB access */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + /* not actually fully unbounded, but the bound is very high */ + .errstr = "R0 unbounded memory access", + .result = REJECT + }, + { + "bounds check after wrapping 32-bit addition", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5), + /* r1 = 0x7fff'ffff */ + BPF_MOV64_IMM(BPF_REG_1, 0x7fffffff), + /* r1 = 0xffff'fffe */ + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff), + /* r1 = 0 */ + BPF_ALU32_IMM(BPF_ADD, BPF_REG_1, 2), + /* no-op */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + /* access at offset 0 */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .result = ACCEPT + }, + { + "bounds check after shift with oversized count operand", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6), + BPF_MOV64_IMM(BPF_REG_2, 32), + BPF_MOV64_IMM(BPF_REG_1, 1), + /* r1 = (u32)1 << (u32)32 = ? */ + BPF_ALU32_REG(BPF_LSH, BPF_REG_1, BPF_REG_2), + /* r1 = [0x0000, 0xffff] */ + BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 0xffff), + /* computes unknown pointer, potentially OOB */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + /* potentially OOB access */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "R0 max value is outside of the array range", + .result = REJECT + }, + { + "bounds check after right shift of maybe-negative number", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6), + /* r1 = [0x00, 0xff] */ + BPF_LDX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + /* r1 = [-0x01, 0xfe] */ + BPF_ALU64_IMM(BPF_SUB, BPF_REG_1, 1), + /* r1 = 0 or 0xff'ffff'ffff'ffff */ + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), + /* r1 = 0 or 0xffff'ffff'ffff */ + BPF_ALU64_IMM(BPF_RSH, BPF_REG_1, 8), + /* computes unknown pointer, potentially OOB */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + /* potentially OOB access */ + BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_0, 0), + /* exit */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "R0 unbounded memory access", + .result = REJECT + }, + { + "bounds check map access with off+size signed 32bit overflow. test1", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x7ffffffe), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), + BPF_JMP_A(0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "map_value pointer and 2147483646", + .result = REJECT + }, + { + "bounds check map access with off+size signed 32bit overflow. test2", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 0x1fffffff), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), + BPF_JMP_A(0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "pointer offset 1073741822", + .result = REJECT + }, + { + "bounds check map access with off+size signed 32bit overflow. test3", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 0x1fffffff), + BPF_ALU64_IMM(BPF_SUB, BPF_REG_0, 0x1fffffff), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 2), + BPF_JMP_A(0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "pointer offset -1073741822", + .result = REJECT + }, + { + "bounds check map access with off+size signed 32bit overflow. test4", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), + BPF_EXIT_INSN(), + BPF_MOV64_IMM(BPF_REG_1, 1000000), + BPF_ALU64_IMM(BPF_MUL, BPF_REG_1, 1000000), + BPF_ALU64_REG(BPF_ADD, BPF_REG_0, BPF_REG_1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 2), + BPF_JMP_A(0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .errstr = "map_value pointer and 1000000000000", + .result = REJECT + }, + { + "pointer/scalar confusion in state equality check (way 1)", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), + BPF_JMP_A(1), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_10), + BPF_JMP_A(0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .result = ACCEPT, + .result_unpriv = REJECT, + .errstr_unpriv = "R0 leaks addr as return value" + }, + { + "pointer/scalar confusion in state equality check (way 2)", + .insns = { + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 2), + BPF_MOV64_REG(BPF_REG_0, BPF_REG_10), + BPF_JMP_A(1), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 3 }, + .result = ACCEPT, + .result_unpriv = REJECT, + .errstr_unpriv = "R0 leaks addr as return value" + }, { "variable-offset ctx access", .insns = { @@ -6597,6 +7049,71 @@ static struct bpf_test tests[] = { .result = REJECT, .prog_type = BPF_PROG_TYPE_LWT_IN, }, + { + "indirect variable-offset stack access", + .insns = { + /* Fill the top 8 bytes of the stack */ + BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + /* Get an unknown value */ + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0), + /* Make it small and 4-byte aligned */ + BPF_ALU64_IMM(BPF_AND, BPF_REG_2, 4), + BPF_ALU64_IMM(BPF_SUB, BPF_REG_2, 8), + /* add it to fp. We now have either fp-4 or fp-8, but + * we don't know which + */ + BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_10), + /* dereference it indirectly */ + BPF_LD_MAP_FD(BPF_REG_1, 0), + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, + BPF_FUNC_map_lookup_elem), + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_EXIT_INSN(), + }, + .fixup_map1 = { 5 }, + .errstr = "variable stack read R2", + .result = REJECT, + .prog_type = BPF_PROG_TYPE_LWT_IN, + }, + { + "direct stack access with 32-bit wraparound. test1", + .insns = { + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x7fffffff), + BPF_MOV32_IMM(BPF_REG_0, 0), + BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + BPF_EXIT_INSN() + }, + .errstr = "fp pointer and 2147483647", + .result = REJECT + }, + { + "direct stack access with 32-bit wraparound. test2", + .insns = { + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x3fffffff), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x3fffffff), + BPF_MOV32_IMM(BPF_REG_0, 0), + BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + BPF_EXIT_INSN() + }, + .errstr = "fp pointer and 1073741823", + .result = REJECT + }, + { + "direct stack access with 32-bit wraparound. test3", + .insns = { + BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x1fffffff), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 0x1fffffff), + BPF_MOV32_IMM(BPF_REG_0, 0), + BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0), + BPF_EXIT_INSN() + }, + .errstr = "fp pointer offset 1073741822", + .result = REJECT + }, { "liveness pruning and write screening", .insns = { From a9772285a7246c2a816da7f50c8c1e94b264770d Mon Sep 17 00:00:00 2001 From: Will Deacon Date: Tue, 24 Oct 2017 11:22:46 +0100 Subject: [PATCH 0729/1013] linux/compiler.h: Split into compiler.h and compiler_types.h commit d15155824c5014803d91b829736d249c500bdda6 upstream. linux/compiler.h is included indirectly by linux/types.h via uapi/linux/types.h -> uapi/linux/posix_types.h -> linux/stddef.h -> uapi/linux/stddef.h and is needed to provide a proper definition of offsetof. Unfortunately, compiler.h requires a definition of smp_read_barrier_depends() for defining lockless_dereference() and soon for defining READ_ONCE(), which means that all users of READ_ONCE() will need to include asm/barrier.h to avoid splats such as: In file included from include/uapi/linux/stddef.h:1:0, from include/linux/stddef.h:4, from arch/h8300/kernel/asm-offsets.c:11: include/linux/list.h: In function 'list_empty': >> include/linux/compiler.h:343:2: error: implicit declaration of function 'smp_read_barrier_depends' [-Werror=implicit-function-declaration] smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \ ^ A better alternative is to include asm/barrier.h in linux/compiler.h, but this requires a type definition for "bool" on some architectures (e.g. x86), which is defined later by linux/types.h. Type "bool" is also used directly in linux/compiler.h, so the whole thing is pretty fragile. This patch splits compiler.h in two: compiler_types.h contains type annotations, definitions and the compiler-specific parts, whereas compiler.h #includes compiler-types.h and additionally defines macros such as {READ,WRITE.ACCESS}_ONCE(). uapi/linux/stddef.h and linux/linkage.h are then moved over to include linux/compiler_types.h, which fixes the build for h8 and blackfin. Signed-off-by: Will Deacon Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/1508840570-22169-2-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/arm/include/asm/ptrace.h | 3 +- arch/sparc/include/asm/ptrace.h | 1 + arch/um/include/shared/init.h | 2 +- include/linux/compiler-clang.h | 2 +- include/linux/compiler-gcc.h | 2 +- include/linux/compiler-intel.h | 2 +- include/linux/compiler.h | 265 +----------------------------- include/linux/compiler_types.h | 274 ++++++++++++++++++++++++++++++++ include/linux/linkage.h | 2 +- include/uapi/linux/stddef.h | 2 +- scripts/headers_install.sh | 2 +- 11 files changed, 286 insertions(+), 271 deletions(-) create mode 100644 include/linux/compiler_types.h diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h index e9c9a117bd25..c7cdbb43ae7c 100644 --- a/arch/arm/include/asm/ptrace.h +++ b/arch/arm/include/asm/ptrace.h @@ -126,8 +126,7 @@ extern unsigned long profile_pc(struct pt_regs *regs); /* * kprobe-based event tracer support */ -#include -#include +#include #define MAX_REG_OFFSET (offsetof(struct pt_regs, ARM_ORIG_r0)) extern int regs_query_register_offset(const char *name); diff --git a/arch/sparc/include/asm/ptrace.h b/arch/sparc/include/asm/ptrace.h index 6a339a78f4f4..71dd82b43cc5 100644 --- a/arch/sparc/include/asm/ptrace.h +++ b/arch/sparc/include/asm/ptrace.h @@ -7,6 +7,7 @@ #if defined(__sparc__) && defined(__arch64__) #ifndef __ASSEMBLY__ +#include #include #include diff --git a/arch/um/include/shared/init.h b/arch/um/include/shared/init.h index 390572daa40d..b3f5865a92c9 100644 --- a/arch/um/include/shared/init.h +++ b/arch/um/include/shared/init.h @@ -41,7 +41,7 @@ typedef int (*initcall_t)(void); typedef void (*exitcall_t)(void); -#include +#include /* These are for everybody (although not all archs will actually discard it in modules) */ diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 780b1242bf24..3b609edffa8f 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __LINUX_COMPILER_H +#ifndef __LINUX_COMPILER_TYPES_H #error "Please don't include directly, include instead." #endif diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index bb78e5bdff26..2272ded07496 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __LINUX_COMPILER_H +#ifndef __LINUX_COMPILER_TYPES_H #error "Please don't include directly, include instead." #endif diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h index 523d1b74550f..bfa08160db3a 100644 --- a/include/linux/compiler-intel.h +++ b/include/linux/compiler-intel.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __LINUX_COMPILER_H +#ifndef __LINUX_COMPILER_TYPES_H #error "Please don't include directly, include instead." #endif diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 712cd8bb00b4..fab5dc250c61 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -2,111 +2,12 @@ #ifndef __LINUX_COMPILER_H #define __LINUX_COMPILER_H -#ifndef __ASSEMBLY__ +#include -#ifdef __CHECKER__ -# define __user __attribute__((noderef, address_space(1))) -# define __kernel __attribute__((address_space(0))) -# define __safe __attribute__((safe)) -# define __force __attribute__((force)) -# define __nocast __attribute__((nocast)) -# define __iomem __attribute__((noderef, address_space(2))) -# define __must_hold(x) __attribute__((context(x,1,1))) -# define __acquires(x) __attribute__((context(x,0,1))) -# define __releases(x) __attribute__((context(x,1,0))) -# define __acquire(x) __context__(x,1) -# define __release(x) __context__(x,-1) -# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) -# define __percpu __attribute__((noderef, address_space(3))) -# define __rcu __attribute__((noderef, address_space(4))) -# define __private __attribute__((noderef)) -extern void __chk_user_ptr(const volatile void __user *); -extern void __chk_io_ptr(const volatile void __iomem *); -# define ACCESS_PRIVATE(p, member) (*((typeof((p)->member) __force *) &(p)->member)) -#else /* __CHECKER__ */ -# ifdef STRUCTLEAK_PLUGIN -# define __user __attribute__((user)) -# else -# define __user -# endif -# define __kernel -# define __safe -# define __force -# define __nocast -# define __iomem -# define __chk_user_ptr(x) (void)0 -# define __chk_io_ptr(x) (void)0 -# define __builtin_warning(x, y...) (1) -# define __must_hold(x) -# define __acquires(x) -# define __releases(x) -# define __acquire(x) (void)0 -# define __release(x) (void)0 -# define __cond_lock(x,c) (c) -# define __percpu -# define __rcu -# define __private -# define ACCESS_PRIVATE(p, member) ((p)->member) -#endif /* __CHECKER__ */ - -/* Indirect macros required for expanded argument pasting, eg. __LINE__. */ -#define ___PASTE(a,b) a##b -#define __PASTE(a,b) ___PASTE(a,b) +#ifndef __ASSEMBLY__ #ifdef __KERNEL__ -#ifdef __GNUC__ -#include -#endif - -#if defined(CC_USING_HOTPATCH) && !defined(__CHECKER__) -#define notrace __attribute__((hotpatch(0,0))) -#else -#define notrace __attribute__((no_instrument_function)) -#endif - -/* Intel compiler defines __GNUC__. So we will overwrite implementations - * coming from above header files here - */ -#ifdef __INTEL_COMPILER -# include -#endif - -/* Clang compiler defines __GNUC__. So we will overwrite implementations - * coming from above header files here - */ -#ifdef __clang__ -#include -#endif - -/* - * Generic compiler-dependent macros required for kernel - * build go below this comment. Actual compiler/compiler version - * specific implementations come from the above header files - */ - -struct ftrace_branch_data { - const char *func; - const char *file; - unsigned line; - union { - struct { - unsigned long correct; - unsigned long incorrect; - }; - struct { - unsigned long miss; - unsigned long hit; - }; - unsigned long miss_hit[2]; - }; -}; - -struct ftrace_likely_data { - struct ftrace_branch_data data; - unsigned long constant; -}; - /* * Note: DISABLE_BRANCH_PROFILING can be used by special lowlevel code * to disable branch tracing on a per file basis. @@ -333,6 +234,7 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s * with an explicit memory barrier or atomic instruction that provides the * required ordering. */ +#include #define __READ_ONCE(x, check) \ ({ \ @@ -364,167 +266,6 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s #endif /* __ASSEMBLY__ */ -#ifdef __KERNEL__ -/* - * Allow us to mark functions as 'deprecated' and have gcc emit a nice - * warning for each use, in hopes of speeding the functions removal. - * Usage is: - * int __deprecated foo(void) - */ -#ifndef __deprecated -# define __deprecated /* unimplemented */ -#endif - -#ifdef MODULE -#define __deprecated_for_modules __deprecated -#else -#define __deprecated_for_modules -#endif - -#ifndef __must_check -#define __must_check -#endif - -#ifndef CONFIG_ENABLE_MUST_CHECK -#undef __must_check -#define __must_check -#endif -#ifndef CONFIG_ENABLE_WARN_DEPRECATED -#undef __deprecated -#undef __deprecated_for_modules -#define __deprecated -#define __deprecated_for_modules -#endif - -#ifndef __malloc -#define __malloc -#endif - -/* - * Allow us to avoid 'defined but not used' warnings on functions and data, - * as well as force them to be emitted to the assembly file. - * - * As of gcc 3.4, static functions that are not marked with attribute((used)) - * may be elided from the assembly file. As of gcc 3.4, static data not so - * marked will not be elided, but this may change in a future gcc version. - * - * NOTE: Because distributions shipped with a backported unit-at-a-time - * compiler in gcc 3.3, we must define __used to be __attribute__((used)) - * for gcc >=3.3 instead of 3.4. - * - * In prior versions of gcc, such functions and data would be emitted, but - * would be warned about except with attribute((unused)). - * - * Mark functions that are referenced only in inline assembly as __used so - * the code is emitted even though it appears to be unreferenced. - */ -#ifndef __used -# define __used /* unimplemented */ -#endif - -#ifndef __maybe_unused -# define __maybe_unused /* unimplemented */ -#endif - -#ifndef __always_unused -# define __always_unused /* unimplemented */ -#endif - -#ifndef noinline -#define noinline -#endif - -/* - * Rather then using noinline to prevent stack consumption, use - * noinline_for_stack instead. For documentation reasons. - */ -#define noinline_for_stack noinline - -#ifndef __always_inline -#define __always_inline inline -#endif - -#endif /* __KERNEL__ */ - -/* - * From the GCC manual: - * - * Many functions do not examine any values except their arguments, - * and have no effects except the return value. Basically this is - * just slightly more strict class than the `pure' attribute above, - * since function is not allowed to read global memory. - * - * Note that a function that has pointer arguments and examines the - * data pointed to must _not_ be declared `const'. Likewise, a - * function that calls a non-`const' function usually must not be - * `const'. It does not make sense for a `const' function to return - * `void'. - */ -#ifndef __attribute_const__ -# define __attribute_const__ /* unimplemented */ -#endif - -#ifndef __designated_init -# define __designated_init -#endif - -#ifndef __latent_entropy -# define __latent_entropy -#endif - -#ifndef __randomize_layout -# define __randomize_layout __designated_init -#endif - -#ifndef __no_randomize_layout -# define __no_randomize_layout -#endif - -#ifndef randomized_struct_fields_start -# define randomized_struct_fields_start -# define randomized_struct_fields_end -#endif - -/* - * Tell gcc if a function is cold. The compiler will assume any path - * directly leading to the call is unlikely. - */ - -#ifndef __cold -#define __cold -#endif - -/* Simple shorthand for a section definition */ -#ifndef __section -# define __section(S) __attribute__ ((__section__(#S))) -#endif - -#ifndef __visible -#define __visible -#endif - -#ifndef __nostackprotector -# define __nostackprotector -#endif - -/* - * Assume alignment of return value. - */ -#ifndef __assume_aligned -#define __assume_aligned(a, ...) -#endif - - -/* Are two types/vars the same type (ignoring qualifiers)? */ -#ifndef __same_type -# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) -#endif - -/* Is this type a native word size -- useful for atomic operations */ -#ifndef __native_word -# define __native_word(t) (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) -#endif - /* Compile time object size, -1 for unknown */ #ifndef __compiletime_object_size # define __compiletime_object_size(obj) -1 diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h new file mode 100644 index 000000000000..6b79a9bba9a7 --- /dev/null +++ b/include/linux/compiler_types.h @@ -0,0 +1,274 @@ +#ifndef __LINUX_COMPILER_TYPES_H +#define __LINUX_COMPILER_TYPES_H + +#ifndef __ASSEMBLY__ + +#ifdef __CHECKER__ +# define __user __attribute__((noderef, address_space(1))) +# define __kernel __attribute__((address_space(0))) +# define __safe __attribute__((safe)) +# define __force __attribute__((force)) +# define __nocast __attribute__((nocast)) +# define __iomem __attribute__((noderef, address_space(2))) +# define __must_hold(x) __attribute__((context(x,1,1))) +# define __acquires(x) __attribute__((context(x,0,1))) +# define __releases(x) __attribute__((context(x,1,0))) +# define __acquire(x) __context__(x,1) +# define __release(x) __context__(x,-1) +# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) +# define __percpu __attribute__((noderef, address_space(3))) +# define __rcu __attribute__((noderef, address_space(4))) +# define __private __attribute__((noderef)) +extern void __chk_user_ptr(const volatile void __user *); +extern void __chk_io_ptr(const volatile void __iomem *); +# define ACCESS_PRIVATE(p, member) (*((typeof((p)->member) __force *) &(p)->member)) +#else /* __CHECKER__ */ +# ifdef STRUCTLEAK_PLUGIN +# define __user __attribute__((user)) +# else +# define __user +# endif +# define __kernel +# define __safe +# define __force +# define __nocast +# define __iomem +# define __chk_user_ptr(x) (void)0 +# define __chk_io_ptr(x) (void)0 +# define __builtin_warning(x, y...) (1) +# define __must_hold(x) +# define __acquires(x) +# define __releases(x) +# define __acquire(x) (void)0 +# define __release(x) (void)0 +# define __cond_lock(x,c) (c) +# define __percpu +# define __rcu +# define __private +# define ACCESS_PRIVATE(p, member) ((p)->member) +#endif /* __CHECKER__ */ + +/* Indirect macros required for expanded argument pasting, eg. __LINE__. */ +#define ___PASTE(a,b) a##b +#define __PASTE(a,b) ___PASTE(a,b) + +#ifdef __KERNEL__ + +#ifdef __GNUC__ +#include +#endif + +#if defined(CC_USING_HOTPATCH) && !defined(__CHECKER__) +#define notrace __attribute__((hotpatch(0,0))) +#else +#define notrace __attribute__((no_instrument_function)) +#endif + +/* Intel compiler defines __GNUC__. So we will overwrite implementations + * coming from above header files here + */ +#ifdef __INTEL_COMPILER +# include +#endif + +/* Clang compiler defines __GNUC__. So we will overwrite implementations + * coming from above header files here + */ +#ifdef __clang__ +#include +#endif + +/* + * Generic compiler-dependent macros required for kernel + * build go below this comment. Actual compiler/compiler version + * specific implementations come from the above header files + */ + +struct ftrace_branch_data { + const char *func; + const char *file; + unsigned line; + union { + struct { + unsigned long correct; + unsigned long incorrect; + }; + struct { + unsigned long miss; + unsigned long hit; + }; + unsigned long miss_hit[2]; + }; +}; + +struct ftrace_likely_data { + struct ftrace_branch_data data; + unsigned long constant; +}; + +#endif /* __KERNEL__ */ + +#endif /* __ASSEMBLY__ */ + +#ifdef __KERNEL__ +/* + * Allow us to mark functions as 'deprecated' and have gcc emit a nice + * warning for each use, in hopes of speeding the functions removal. + * Usage is: + * int __deprecated foo(void) + */ +#ifndef __deprecated +# define __deprecated /* unimplemented */ +#endif + +#ifdef MODULE +#define __deprecated_for_modules __deprecated +#else +#define __deprecated_for_modules +#endif + +#ifndef __must_check +#define __must_check +#endif + +#ifndef CONFIG_ENABLE_MUST_CHECK +#undef __must_check +#define __must_check +#endif +#ifndef CONFIG_ENABLE_WARN_DEPRECATED +#undef __deprecated +#undef __deprecated_for_modules +#define __deprecated +#define __deprecated_for_modules +#endif + +#ifndef __malloc +#define __malloc +#endif + +/* + * Allow us to avoid 'defined but not used' warnings on functions and data, + * as well as force them to be emitted to the assembly file. + * + * As of gcc 3.4, static functions that are not marked with attribute((used)) + * may be elided from the assembly file. As of gcc 3.4, static data not so + * marked will not be elided, but this may change in a future gcc version. + * + * NOTE: Because distributions shipped with a backported unit-at-a-time + * compiler in gcc 3.3, we must define __used to be __attribute__((used)) + * for gcc >=3.3 instead of 3.4. + * + * In prior versions of gcc, such functions and data would be emitted, but + * would be warned about except with attribute((unused)). + * + * Mark functions that are referenced only in inline assembly as __used so + * the code is emitted even though it appears to be unreferenced. + */ +#ifndef __used +# define __used /* unimplemented */ +#endif + +#ifndef __maybe_unused +# define __maybe_unused /* unimplemented */ +#endif + +#ifndef __always_unused +# define __always_unused /* unimplemented */ +#endif + +#ifndef noinline +#define noinline +#endif + +/* + * Rather then using noinline to prevent stack consumption, use + * noinline_for_stack instead. For documentation reasons. + */ +#define noinline_for_stack noinline + +#ifndef __always_inline +#define __always_inline inline +#endif + +#endif /* __KERNEL__ */ + +/* + * From the GCC manual: + * + * Many functions do not examine any values except their arguments, + * and have no effects except the return value. Basically this is + * just slightly more strict class than the `pure' attribute above, + * since function is not allowed to read global memory. + * + * Note that a function that has pointer arguments and examines the + * data pointed to must _not_ be declared `const'. Likewise, a + * function that calls a non-`const' function usually must not be + * `const'. It does not make sense for a `const' function to return + * `void'. + */ +#ifndef __attribute_const__ +# define __attribute_const__ /* unimplemented */ +#endif + +#ifndef __designated_init +# define __designated_init +#endif + +#ifndef __latent_entropy +# define __latent_entropy +#endif + +#ifndef __randomize_layout +# define __randomize_layout __designated_init +#endif + +#ifndef __no_randomize_layout +# define __no_randomize_layout +#endif + +#ifndef randomized_struct_fields_start +# define randomized_struct_fields_start +# define randomized_struct_fields_end +#endif + +/* + * Tell gcc if a function is cold. The compiler will assume any path + * directly leading to the call is unlikely. + */ + +#ifndef __cold +#define __cold +#endif + +/* Simple shorthand for a section definition */ +#ifndef __section +# define __section(S) __attribute__ ((__section__(#S))) +#endif + +#ifndef __visible +#define __visible +#endif + +#ifndef __nostackprotector +# define __nostackprotector +#endif + +/* + * Assume alignment of return value. + */ +#ifndef __assume_aligned +#define __assume_aligned(a, ...) +#endif + + +/* Are two types/vars the same type (ignoring qualifiers)? */ +#ifndef __same_type +# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b)) +#endif + +/* Is this type a native word size -- useful for atomic operations */ +#ifndef __native_word +# define __native_word(t) (sizeof(t) == sizeof(char) || sizeof(t) == sizeof(short) || sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long)) +#endif + +#endif /* __LINUX_COMPILER_TYPES_H */ diff --git a/include/linux/linkage.h b/include/linux/linkage.h index 2e6f90bd52aa..f68db9e450eb 100644 --- a/include/linux/linkage.h +++ b/include/linux/linkage.h @@ -2,7 +2,7 @@ #ifndef _LINUX_LINKAGE_H #define _LINUX_LINKAGE_H -#include +#include #include #include #include diff --git a/include/uapi/linux/stddef.h b/include/uapi/linux/stddef.h index f65b92e0e1f9..ee8220f8dcf5 100644 --- a/include/uapi/linux/stddef.h +++ b/include/uapi/linux/stddef.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ -#include +#include #ifndef __always_inline #define __always_inline inline diff --git a/scripts/headers_install.sh b/scripts/headers_install.sh index 4d1ea96e8794..a18bca720995 100755 --- a/scripts/headers_install.sh +++ b/scripts/headers_install.sh @@ -34,7 +34,7 @@ do sed -r \ -e 's/([ \t(])(__user|__force|__iomem)[ \t]/\1/g' \ -e 's/__attribute_const__([ \t]|$)/\1/g' \ - -e 's@^#include @@' \ + -e 's@^#include @@' \ -e 's/(^|[^a-zA-Z0-9])__packed([^a-zA-Z0-9_]|$)/\1__attribute__((packed))\2/g' \ -e 's/(^|[ \t(])(inline|asm|volatile)([ \t(]|$)/\1__\2__\3/g' \ -e 's@#(ifndef|define|endif[ \t]*/[*])[ \t]*_UAPI@#\1 @' \ From dad5c1402c570cd07a80113784bc20a7f930c8ae Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 25 Dec 2017 14:26:48 +0100 Subject: [PATCH 0730/1013] Linux 4.14.9 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 41632ea25055..ed2132c6d286 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 8 +SUBLEVEL = 9 EXTRAVERSION = NAME = Petit Gorille From add9f2a47035fe94603b8ffe8d1d323b5fb0429f Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 27 Dec 2017 13:53:08 +0100 Subject: [PATCH 0731/1013] Revert "ipv6: grab rt->rt6i_ref before allocating pcpu rt" This reverts commit 9704f8147e88213f2fa580f713b42b08a4f1a7d2 which was upstream commit a94b9367e044ba672c9f4105eb1516ff6ff4948a. Shouldn't have been here, sorry about that. Reported-by: Chris Rankin Reported-by: Willy Tarreau Cc: Ido Schimmel Cc: Ozgur Cc: Wei Wang Cc: Martin KaFai Lau Cc: Eric Dumazet Cc: David S. Miller Cc: Sasha Levin Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 58 ++++++++++++++++++++++++------------------------ 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 76b47682f77f..598efa8cfe25 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1055,6 +1055,7 @@ static struct rt6_info *rt6_get_pcpu_route(struct rt6_info *rt) static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) { + struct fib6_table *table = rt->rt6i_table; struct rt6_info *pcpu_rt, *prev, **p; pcpu_rt = ip6_rt_pcpu_alloc(rt); @@ -1065,20 +1066,28 @@ static struct rt6_info *rt6_make_pcpu_route(struct rt6_info *rt) return net->ipv6.ip6_null_entry; } - dst_hold(&pcpu_rt->dst); - p = this_cpu_ptr(rt->rt6i_pcpu); - prev = cmpxchg(p, NULL, pcpu_rt); - if (prev) { - /* If someone did it before us, return prev instead */ - /* release refcnt taken by ip6_rt_pcpu_alloc() */ - dst_release_immediate(&pcpu_rt->dst); - /* release refcnt taken by above dst_hold() */ + read_lock_bh(&table->tb6_lock); + if (rt->rt6i_pcpu) { + p = this_cpu_ptr(rt->rt6i_pcpu); + prev = cmpxchg(p, NULL, pcpu_rt); + if (prev) { + /* If someone did it before us, return prev instead */ + dst_release_immediate(&pcpu_rt->dst); + pcpu_rt = prev; + } + } else { + /* rt has been removed from the fib6 tree + * before we have a chance to acquire the read_lock. + * In this case, don't brother to create a pcpu rt + * since rt is going away anyway. The next + * dst_check() will trigger a re-lookup. + */ dst_release_immediate(&pcpu_rt->dst); - dst_hold(&prev->dst); - pcpu_rt = prev; + pcpu_rt = rt; } - + dst_hold(&pcpu_rt->dst); rt6_dst_from_metrics_check(pcpu_rt); + read_unlock_bh(&table->tb6_lock); return pcpu_rt; } @@ -1168,28 +1177,19 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table, if (pcpu_rt) { read_unlock_bh(&table->tb6_lock); } else { - /* atomic_inc_not_zero() is needed when using rcu */ - if (atomic_inc_not_zero(&rt->rt6i_ref)) { - /* We have to do the read_unlock first - * because rt6_make_pcpu_route() may trigger - * ip6_dst_gc() which will take the write_lock. - * - * No dst_hold() on rt is needed because grabbing - * rt->rt6i_ref makes sure rt can't be released. - */ - read_unlock_bh(&table->tb6_lock); - pcpu_rt = rt6_make_pcpu_route(rt); - rt6_release(rt); - } else { - /* rt is already removed from tree */ - read_unlock_bh(&table->tb6_lock); - pcpu_rt = net->ipv6.ip6_null_entry; - dst_hold(&pcpu_rt->dst); - } + /* We have to do the read_unlock first + * because rt6_make_pcpu_route() may trigger + * ip6_dst_gc() which will take the write_lock. + */ + dst_hold(&rt->dst); + read_unlock_bh(&table->tb6_lock); + pcpu_rt = rt6_make_pcpu_route(rt); + dst_release(&rt->dst); } trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6); return pcpu_rt; + } } EXPORT_SYMBOL_GPL(ip6_pol_route); From 62c37437a110cfec0d2b46f5d28113a631187978 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 6 Nov 2017 07:21:50 -0600 Subject: [PATCH 0732/1013] objtool: Move synced files to their original relative locations commit 6a77cff819ae3e31992bde6432c9b5720748a89b upstream. This will enable more straightforward comparisons, and it also makes the files 100% identical. Suggested-by: Ingo Molnar Signed-off-by: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/407b2aaa317741f48fcf821592c0e96ab3be1890.1509974346.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/objtool/.gitignore | 2 +- tools/objtool/Makefile | 22 ++++++++++--------- tools/objtool/arch/x86/Build | 10 ++++----- tools/objtool/arch/x86/decode.c | 6 ++--- .../arch/x86/{insn => include/asm}/inat.h | 2 +- .../x86/{insn => include/asm}/inat_types.h | 0 .../arch/x86/{insn => include/asm}/insn.h | 2 +- .../{ => arch/x86/include/asm}/orc_types.h | 0 tools/objtool/arch/x86/{insn => lib}/inat.c | 2 +- tools/objtool/arch/x86/{insn => lib}/insn.c | 4 ++-- .../arch/x86/{insn => lib}/x86-opcode-map.txt | 0 .../x86/{insn => tools}/gen-insn-attr-x86.awk | 0 tools/objtool/orc.h | 2 +- 13 files changed, 27 insertions(+), 25 deletions(-) rename tools/objtool/arch/x86/{insn => include/asm}/inat.h (99%) rename tools/objtool/arch/x86/{insn => include/asm}/inat_types.h (100%) rename tools/objtool/arch/x86/{insn => include/asm}/insn.h (99%) rename tools/objtool/{ => arch/x86/include/asm}/orc_types.h (100%) rename tools/objtool/arch/x86/{insn => lib}/inat.c (99%) rename tools/objtool/arch/x86/{insn => lib}/insn.c (99%) rename tools/objtool/arch/x86/{insn => lib}/x86-opcode-map.txt (100%) rename tools/objtool/arch/x86/{insn => tools}/gen-insn-attr-x86.awk (100%) diff --git a/tools/objtool/.gitignore b/tools/objtool/.gitignore index d3102c865a95..914cff12899b 100644 --- a/tools/objtool/.gitignore +++ b/tools/objtool/.gitignore @@ -1,3 +1,3 @@ -arch/x86/insn/inat-tables.c +arch/x86/lib/inat-tables.c objtool fixdep diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile index 424b1965d06f..c6a19d946ec1 100644 --- a/tools/objtool/Makefile +++ b/tools/objtool/Makefile @@ -25,7 +25,9 @@ OBJTOOL_IN := $(OBJTOOL)-in.o all: $(OBJTOOL) -INCLUDES := -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(HOSTARCH)/include/uapi +INCLUDES := -I$(srctree)/tools/include \ + -I$(srctree)/tools/arch/$(HOSTARCH)/include/uapi \ + -I$(srctree)/tools/objtool/arch/$(HOSTARCH)/include WARNINGS := $(EXTRA_WARNINGS) -Wno-switch-default -Wno-switch-enum -Wno-packed CFLAGS += -Wall -Werror $(WARNINGS) -fomit-frame-pointer -O2 -g $(INCLUDES) LDFLAGS += -lelf $(LIBSUBCMD) @@ -46,16 +48,16 @@ $(OBJTOOL_IN): fixdep FORCE $(OBJTOOL): $(LIBSUBCMD) $(OBJTOOL_IN) @(diff -I 2>&1 | grep -q 'option requires an argument' && \ test -d ../../kernel -a -d ../../tools -a -d ../objtool && (( \ - diff -I'^#include' arch/x86/insn/insn.c ../../arch/x86/lib/insn.c >/dev/null && \ - diff -I'^#include' arch/x86/insn/inat.c ../../arch/x86/lib/inat.c >/dev/null && \ - diff arch/x86/insn/x86-opcode-map.txt ../../arch/x86/lib/x86-opcode-map.txt >/dev/null && \ - diff arch/x86/insn/gen-insn-attr-x86.awk ../../arch/x86/tools/gen-insn-attr-x86.awk >/dev/null && \ - diff -I'^#include' arch/x86/insn/insn.h ../../arch/x86/include/asm/insn.h >/dev/null && \ - diff -I'^#include' arch/x86/insn/inat.h ../../arch/x86/include/asm/inat.h >/dev/null && \ - diff -I'^#include' arch/x86/insn/inat_types.h ../../arch/x86/include/asm/inat_types.h >/dev/null) \ + diff arch/x86/lib/insn.c ../../arch/x86/lib/insn.c >/dev/null && \ + diff arch/x86/lib/inat.c ../../arch/x86/lib/inat.c >/dev/null && \ + diff arch/x86/lib/x86-opcode-map.txt ../../arch/x86/lib/x86-opcode-map.txt >/dev/null && \ + diff arch/x86/tools/gen-insn-attr-x86.awk ../../arch/x86/tools/gen-insn-attr-x86.awk >/dev/null && \ + diff arch/x86/include/asm/insn.h ../../arch/x86/include/asm/insn.h >/dev/null && \ + diff arch/x86/include/asm/inat.h ../../arch/x86/include/asm/inat.h >/dev/null && \ + diff arch/x86/include/asm/inat_types.h ../../arch/x86/include/asm/inat_types.h >/dev/null) \ || echo "warning: objtool: x86 instruction decoder differs from kernel" >&2 )) || true @(test -d ../../kernel -a -d ../../tools -a -d ../objtool && (( \ - diff ../../arch/x86/include/asm/orc_types.h orc_types.h >/dev/null) \ + diff ../../arch/x86/include/asm/orc_types.h arch/x86/include/asm/orc_types.h >/dev/null) \ || echo "warning: objtool: orc_types.h differs from kernel" >&2 )) || true $(QUIET_LINK)$(CC) $(OBJTOOL_IN) $(LDFLAGS) -o $@ @@ -66,7 +68,7 @@ $(LIBSUBCMD): fixdep FORCE clean: $(call QUIET_CLEAN, objtool) $(RM) $(OBJTOOL) $(Q)find $(OUTPUT) -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete - $(Q)$(RM) $(OUTPUT)arch/x86/insn/inat-tables.c $(OUTPUT)fixdep + $(Q)$(RM) $(OUTPUT)arch/x86/lib/inat-tables.c $(OUTPUT)fixdep FORCE: diff --git a/tools/objtool/arch/x86/Build b/tools/objtool/arch/x86/Build index debbdb0b5c43..b998412c017d 100644 --- a/tools/objtool/arch/x86/Build +++ b/tools/objtool/arch/x86/Build @@ -1,12 +1,12 @@ objtool-y += decode.o -inat_tables_script = arch/x86/insn/gen-insn-attr-x86.awk -inat_tables_maps = arch/x86/insn/x86-opcode-map.txt +inat_tables_script = arch/x86/tools/gen-insn-attr-x86.awk +inat_tables_maps = arch/x86/lib/x86-opcode-map.txt -$(OUTPUT)arch/x86/insn/inat-tables.c: $(inat_tables_script) $(inat_tables_maps) +$(OUTPUT)arch/x86/lib/inat-tables.c: $(inat_tables_script) $(inat_tables_maps) $(call rule_mkdir) $(Q)$(call echo-cmd,gen)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ -$(OUTPUT)arch/x86/decode.o: $(OUTPUT)arch/x86/insn/inat-tables.c +$(OUTPUT)arch/x86/decode.o: $(OUTPUT)arch/x86/lib/inat-tables.c -CFLAGS_decode.o += -I$(OUTPUT)arch/x86/insn +CFLAGS_decode.o += -I$(OUTPUT)arch/x86/lib diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c index 34a579f806e3..8acfc47af70e 100644 --- a/tools/objtool/arch/x86/decode.c +++ b/tools/objtool/arch/x86/decode.c @@ -19,9 +19,9 @@ #include #define unlikely(cond) (cond) -#include "insn/insn.h" -#include "insn/inat.c" -#include "insn/insn.c" +#include +#include "lib/inat.c" +#include "lib/insn.c" #include "../../elf.h" #include "../../arch.h" diff --git a/tools/objtool/arch/x86/insn/inat.h b/tools/objtool/arch/x86/include/asm/inat.h similarity index 99% rename from tools/objtool/arch/x86/insn/inat.h rename to tools/objtool/arch/x86/include/asm/inat.h index 125ecd2a300d..02aff0867211 100644 --- a/tools/objtool/arch/x86/insn/inat.h +++ b/tools/objtool/arch/x86/include/asm/inat.h @@ -20,7 +20,7 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ -#include "inat_types.h" +#include /* * Internal bits. Don't use bitmasks directly, because these bits are diff --git a/tools/objtool/arch/x86/insn/inat_types.h b/tools/objtool/arch/x86/include/asm/inat_types.h similarity index 100% rename from tools/objtool/arch/x86/insn/inat_types.h rename to tools/objtool/arch/x86/include/asm/inat_types.h diff --git a/tools/objtool/arch/x86/insn/insn.h b/tools/objtool/arch/x86/include/asm/insn.h similarity index 99% rename from tools/objtool/arch/x86/insn/insn.h rename to tools/objtool/arch/x86/include/asm/insn.h index e23578c7b1be..b3e32b010ab1 100644 --- a/tools/objtool/arch/x86/insn/insn.h +++ b/tools/objtool/arch/x86/include/asm/insn.h @@ -21,7 +21,7 @@ */ /* insn_attr_t is defined in inat.h */ -#include "inat.h" +#include struct insn_field { union { diff --git a/tools/objtool/orc_types.h b/tools/objtool/arch/x86/include/asm/orc_types.h similarity index 100% rename from tools/objtool/orc_types.h rename to tools/objtool/arch/x86/include/asm/orc_types.h diff --git a/tools/objtool/arch/x86/insn/inat.c b/tools/objtool/arch/x86/lib/inat.c similarity index 99% rename from tools/objtool/arch/x86/insn/inat.c rename to tools/objtool/arch/x86/lib/inat.c index e4bf28e6f4c7..c1f01a8e9f65 100644 --- a/tools/objtool/arch/x86/insn/inat.c +++ b/tools/objtool/arch/x86/lib/inat.c @@ -18,7 +18,7 @@ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * */ -#include "insn.h" +#include /* Attribute tables are generated from opcode map */ #include "inat-tables.c" diff --git a/tools/objtool/arch/x86/insn/insn.c b/tools/objtool/arch/x86/lib/insn.c similarity index 99% rename from tools/objtool/arch/x86/insn/insn.c rename to tools/objtool/arch/x86/lib/insn.c index ca983e2bea8b..1088eb8f3a5f 100644 --- a/tools/objtool/arch/x86/insn/insn.c +++ b/tools/objtool/arch/x86/lib/insn.c @@ -23,8 +23,8 @@ #else #include #endif -#include "inat.h" -#include "insn.h" +#include +#include /* Verify next sizeof(t) bytes can be on the same instruction */ #define validate_next(t, insn, n) \ diff --git a/tools/objtool/arch/x86/insn/x86-opcode-map.txt b/tools/objtool/arch/x86/lib/x86-opcode-map.txt similarity index 100% rename from tools/objtool/arch/x86/insn/x86-opcode-map.txt rename to tools/objtool/arch/x86/lib/x86-opcode-map.txt diff --git a/tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk b/tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk similarity index 100% rename from tools/objtool/arch/x86/insn/gen-insn-attr-x86.awk rename to tools/objtool/arch/x86/tools/gen-insn-attr-x86.awk diff --git a/tools/objtool/orc.h b/tools/objtool/orc.h index a4139e386ef3..b0e92a6d0903 100644 --- a/tools/objtool/orc.h +++ b/tools/objtool/orc.h @@ -18,7 +18,7 @@ #ifndef _ORC_H #define _ORC_H -#include "orc_types.h" +#include struct objtool_file; From 2845aee45c363e15c1e677b1f4d0043f361c102b Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 6 Nov 2017 07:21:51 -0600 Subject: [PATCH 0733/1013] objtool: Move kernel headers/code sync check to a script commit 3bd51c5a371de917e4e7401c9df006b5998579df upstream. Replace the nasty diff checks in the objtool Makefile with a clean bash script, and make the warnings more specific. Heavily inspired by tools/perf/check-headers.sh. Suggested-by: Ingo Molnar Signed-off-by: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/ab015f15ccd8c0c6008493c3c6ee3d495eaf2927.1509974346.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/objtool/Makefile | 16 +--------------- tools/objtool/sync-check.sh | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+), 15 deletions(-) create mode 100755 tools/objtool/sync-check.sh diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile index c6a19d946ec1..6aaed251b4ed 100644 --- a/tools/objtool/Makefile +++ b/tools/objtool/Makefile @@ -43,22 +43,8 @@ include $(srctree)/tools/build/Makefile.include $(OBJTOOL_IN): fixdep FORCE @$(MAKE) $(build)=objtool -# Busybox's diff doesn't have -I, avoid warning in that case -# $(OBJTOOL): $(LIBSUBCMD) $(OBJTOOL_IN) - @(diff -I 2>&1 | grep -q 'option requires an argument' && \ - test -d ../../kernel -a -d ../../tools -a -d ../objtool && (( \ - diff arch/x86/lib/insn.c ../../arch/x86/lib/insn.c >/dev/null && \ - diff arch/x86/lib/inat.c ../../arch/x86/lib/inat.c >/dev/null && \ - diff arch/x86/lib/x86-opcode-map.txt ../../arch/x86/lib/x86-opcode-map.txt >/dev/null && \ - diff arch/x86/tools/gen-insn-attr-x86.awk ../../arch/x86/tools/gen-insn-attr-x86.awk >/dev/null && \ - diff arch/x86/include/asm/insn.h ../../arch/x86/include/asm/insn.h >/dev/null && \ - diff arch/x86/include/asm/inat.h ../../arch/x86/include/asm/inat.h >/dev/null && \ - diff arch/x86/include/asm/inat_types.h ../../arch/x86/include/asm/inat_types.h >/dev/null) \ - || echo "warning: objtool: x86 instruction decoder differs from kernel" >&2 )) || true - @(test -d ../../kernel -a -d ../../tools -a -d ../objtool && (( \ - diff ../../arch/x86/include/asm/orc_types.h arch/x86/include/asm/orc_types.h >/dev/null) \ - || echo "warning: objtool: orc_types.h differs from kernel" >&2 )) || true + @./sync-check.sh $(QUIET_LINK)$(CC) $(OBJTOOL_IN) $(LDFLAGS) -o $@ diff --git a/tools/objtool/sync-check.sh b/tools/objtool/sync-check.sh new file mode 100755 index 000000000000..1470e74e9d66 --- /dev/null +++ b/tools/objtool/sync-check.sh @@ -0,0 +1,29 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0 + +FILES=' +arch/x86/lib/insn.c +arch/x86/lib/inat.c +arch/x86/lib/x86-opcode-map.txt +arch/x86/tools/gen-insn-attr-x86.awk +arch/x86/include/asm/insn.h +arch/x86/include/asm/inat.h +arch/x86/include/asm/inat_types.h +arch/x86/include/asm/orc_types.h +' + +check() +{ + local file=$1 + + diff $file ../../$file > /dev/null || + echo "Warning: synced file at 'tools/objtool/$file' differs from latest kernel version at '$file'" +} + +if [ ! -d ../../kernel ] || [ ! -d ../../tools ] || [ ! -d ../objtool ]; then + exit 0 +fi + +for i in $FILES; do + check $i +done From 3033f9e601f45f372d48bd266629abd681cea116 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Tue, 7 Nov 2017 21:01:52 -0600 Subject: [PATCH 0734/1013] objtool: Fix cross-build commit 9eb719855f6c9b21eb5889d9ac2ca1c60527ad89 upstream. Stephen Rothwell reported this cross-compilation build failure: | In file included from orc_dump.c:19:0: | orc.h:21:10: fatal error: asm/orc_types.h: No such file or directory | ... Caused by: 6a77cff819ae ("objtool: Move synced files to their original relative locations") Use the proper arch header files location, not the host-arch location. Bisected-by: Stephen Rothwell Reported-by: Stephen Rothwell Signed-off-by: Josh Poimboeuf Cc: Linus Torvalds Cc: Linux-Next Mailing List Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20171108030152.bd76eahiwjwjt3kp@treble Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/objtool/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile index 6aaed251b4ed..0f94af3ccaaa 100644 --- a/tools/objtool/Makefile +++ b/tools/objtool/Makefile @@ -27,7 +27,7 @@ all: $(OBJTOOL) INCLUDES := -I$(srctree)/tools/include \ -I$(srctree)/tools/arch/$(HOSTARCH)/include/uapi \ - -I$(srctree)/tools/objtool/arch/$(HOSTARCH)/include + -I$(srctree)/tools/objtool/arch/$(ARCH)/include WARNINGS := $(EXTRA_WARNINGS) -Wno-switch-default -Wno-switch-enum -Wno-packed CFLAGS += -Wall -Werror $(WARNINGS) -fomit-frame-pointer -O2 -g $(INCLUDES) LDFLAGS += -lelf $(LIBSUBCMD) From 6a8f7688094c05e02b392b98e6977497a5096e20 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Tue, 14 Nov 2017 07:24:22 +0100 Subject: [PATCH 0735/1013] tools/headers: Sync objtool UAPI header commit a356d2ae50790f49858ebed35da9e206336fafee upstream. objtool grew this new warning: Warning: synced file at 'tools/objtool/arch/x86/include/asm/inat.h' differs from latest kernel version at 'arch/x86/include/asm/inat.h' which upstream header grew new INAT_SEG_* definitions. Sync up the tooling version of the header. Reported-by: Linus Torvalds Cc: Josh Poimboeuf Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/objtool/arch/x86/include/asm/inat.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/tools/objtool/arch/x86/include/asm/inat.h b/tools/objtool/arch/x86/include/asm/inat.h index 02aff0867211..1c78580e58be 100644 --- a/tools/objtool/arch/x86/include/asm/inat.h +++ b/tools/objtool/arch/x86/include/asm/inat.h @@ -97,6 +97,16 @@ #define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM) #define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS) +/* Identifiers for segment registers */ +#define INAT_SEG_REG_IGNORE 0 +#define INAT_SEG_REG_DEFAULT 1 +#define INAT_SEG_REG_CS 2 +#define INAT_SEG_REG_SS 3 +#define INAT_SEG_REG_DS 4 +#define INAT_SEG_REG_ES 5 +#define INAT_SEG_REG_FS 6 +#define INAT_SEG_REG_GS 7 + /* Attribute search APIs */ extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode); extern int inat_get_last_prefix_id(insn_byte_t last_pfx); From 76358c8d90c73c896000ccee3375b709bc955bc5 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Sat, 2 Dec 2017 16:17:44 -0600 Subject: [PATCH 0736/1013] objtool: Fix 64-bit build on 32-bit host commit 14c47b54b0d9389e3ca0718e805cdd90c5a4303a upstream. The new ORC unwinder breaks the build of a 64-bit kernel on a 32-bit host. Building the kernel on a i386 or x32 host fails with: orc_dump.c: In function 'orc_dump': orc_dump.c:105:26: error: passing argument 2 of 'elf_getshdrnum' from incompatible pointer type [-Werror=incompatible-pointer-types] if (elf_getshdrnum(elf, &nr_sections)) { ^ In file included from /usr/local/include/gelf.h:32:0, from elf.h:22, from warn.h:26, from orc_dump.c:20: /usr/local/include/libelf.h:304:12: note: expected 'size_t * {aka unsigned int *}' but argument is of type 'long unsigned int *' extern int elf_getshdrnum (Elf *__elf, size_t *__dst); ^~~~~~~~~~~~~~ orc_dump.c:190:17: error: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'Elf64_Sxword {aka long long int}' [-Werror=format=] printf("%s+%lx:", name, rela.r_addend); ~~^ ~~~~~~~~~~~~~ %llx Fix the build failure. Another problem is that if the user specifies HOSTCC or HOSTLD variables, they are ignored in the objtool makefile. Change the Makefile to respect these variables. Signed-off-by: Mikulas Patocka Signed-off-by: Josh Poimboeuf Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Sven Joachim Cc: Thomas Gleixner Fixes: 627fce14809b ("objtool: Add ORC unwind table generation") Link: http://lkml.kernel.org/r/19f0e64d8e07e30a7b307cd010eb780c404fe08d.1512252895.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- tools/objtool/Makefile | 8 +++++--- tools/objtool/orc_dump.c | 7 ++++--- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile index 0f94af3ccaaa..ae0272f9a091 100644 --- a/tools/objtool/Makefile +++ b/tools/objtool/Makefile @@ -7,9 +7,11 @@ ARCH := x86 endif # always use the host compiler -CC = gcc -LD = ld -AR = ar +HOSTCC ?= gcc +HOSTLD ?= ld +CC = $(HOSTCC) +LD = $(HOSTLD) +AR = ar ifeq ($(srctree),) srctree := $(patsubst %/,%,$(dir $(CURDIR))) diff --git a/tools/objtool/orc_dump.c b/tools/objtool/orc_dump.c index 36c5bf6a2675..c3343820916a 100644 --- a/tools/objtool/orc_dump.c +++ b/tools/objtool/orc_dump.c @@ -76,7 +76,8 @@ int orc_dump(const char *_objname) int fd, nr_entries, i, *orc_ip = NULL, orc_size = 0; struct orc_entry *orc = NULL; char *name; - unsigned long nr_sections, orc_ip_addr = 0; + size_t nr_sections; + Elf64_Addr orc_ip_addr = 0; size_t shstrtab_idx; Elf *elf; Elf_Scn *scn; @@ -187,10 +188,10 @@ int orc_dump(const char *_objname) return -1; } - printf("%s+%lx:", name, rela.r_addend); + printf("%s+%llx:", name, (unsigned long long)rela.r_addend); } else { - printf("%lx:", orc_ip_addr + (i * sizeof(int)) + orc_ip[i]); + printf("%llx:", (unsigned long long)(orc_ip_addr + (i * sizeof(int)) + orc_ip[i])); } From da8eb8ad0e695ed34b114df28505d4dd5b4ea693 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 11 Dec 2017 10:38:36 -0800 Subject: [PATCH 0737/1013] x86/decoder: Fix and update the opcodes map commit f5b5fab1780c98b74526dbac527574bd02dc16f8 upstream Update x86-opcode-map.txt based on the October 2017 Intel SDM publication. Fix INVPID to INVVPID. Add UD0 and UD1 instruction opcodes. Also sync the objtool and perf tooling copies of this file. Signed-off-by: Randy Dunlap Acked-by: Masami Hiramatsu Cc: Josh Poimboeuf Cc: Linus Torvalds Cc: Masami Hiramatsu Cc: Peter Zijlstra Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/aac062d7-c0f6-96e3-5c92-ed299e2bd3da@infradead.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/lib/x86-opcode-map.txt | 13 +++++++++++-- tools/objtool/arch/x86/lib/x86-opcode-map.txt | 15 ++++++++++++--- .../perf/util/intel-pt-decoder/x86-opcode-map.txt | 15 ++++++++++++--- 3 files changed, 35 insertions(+), 8 deletions(-) diff --git a/arch/x86/lib/x86-opcode-map.txt b/arch/x86/lib/x86-opcode-map.txt index c4d55919fac1..e0b85930dd77 100644 --- a/arch/x86/lib/x86-opcode-map.txt +++ b/arch/x86/lib/x86-opcode-map.txt @@ -607,7 +607,7 @@ fb: psubq Pq,Qq | vpsubq Vx,Hx,Wx (66),(v1) fc: paddb Pq,Qq | vpaddb Vx,Hx,Wx (66),(v1) fd: paddw Pq,Qq | vpaddw Vx,Hx,Wx (66),(v1) fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1) -ff: +ff: UD0 EndTable Table: 3-byte opcode 1 (0x0f 0x38) @@ -717,7 +717,7 @@ AVXcode: 2 7e: vpermt2d/q Vx,Hx,Wx (66),(ev) 7f: vpermt2ps/d Vx,Hx,Wx (66),(ev) 80: INVEPT Gy,Mdq (66) -81: INVPID Gy,Mdq (66) +81: INVVPID Gy,Mdq (66) 82: INVPCID Gy,Mdq (66) 83: vpmultishiftqb Vx,Hx,Wx (66),(ev) 88: vexpandps/d Vpd,Wpd (66),(ev) @@ -970,6 +970,15 @@ GrpTable: Grp9 EndTable GrpTable: Grp10 +# all are UD1 +0: UD1 +1: UD1 +2: UD1 +3: UD1 +4: UD1 +5: UD1 +6: UD1 +7: UD1 EndTable # Grp11A and Grp11B are expressed as Grp11 in Intel SDM diff --git a/tools/objtool/arch/x86/lib/x86-opcode-map.txt b/tools/objtool/arch/x86/lib/x86-opcode-map.txt index 12e377184ee4..e0b85930dd77 100644 --- a/tools/objtool/arch/x86/lib/x86-opcode-map.txt +++ b/tools/objtool/arch/x86/lib/x86-opcode-map.txt @@ -607,7 +607,7 @@ fb: psubq Pq,Qq | vpsubq Vx,Hx,Wx (66),(v1) fc: paddb Pq,Qq | vpaddb Vx,Hx,Wx (66),(v1) fd: paddw Pq,Qq | vpaddw Vx,Hx,Wx (66),(v1) fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1) -ff: +ff: UD0 EndTable Table: 3-byte opcode 1 (0x0f 0x38) @@ -717,7 +717,7 @@ AVXcode: 2 7e: vpermt2d/q Vx,Hx,Wx (66),(ev) 7f: vpermt2ps/d Vx,Hx,Wx (66),(ev) 80: INVEPT Gy,Mdq (66) -81: INVPID Gy,Mdq (66) +81: INVVPID Gy,Mdq (66) 82: INVPCID Gy,Mdq (66) 83: vpmultishiftqb Vx,Hx,Wx (66),(ev) 88: vexpandps/d Vpd,Wpd (66),(ev) @@ -896,7 +896,7 @@ EndTable GrpTable: Grp3_1 0: TEST Eb,Ib -1: +1: TEST Eb,Ib 2: NOT Eb 3: NEG Eb 4: MUL AL,Eb @@ -970,6 +970,15 @@ GrpTable: Grp9 EndTable GrpTable: Grp10 +# all are UD1 +0: UD1 +1: UD1 +2: UD1 +3: UD1 +4: UD1 +5: UD1 +6: UD1 +7: UD1 EndTable # Grp11A and Grp11B are expressed as Grp11 in Intel SDM diff --git a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt index 12e377184ee4..e0b85930dd77 100644 --- a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt +++ b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt @@ -607,7 +607,7 @@ fb: psubq Pq,Qq | vpsubq Vx,Hx,Wx (66),(v1) fc: paddb Pq,Qq | vpaddb Vx,Hx,Wx (66),(v1) fd: paddw Pq,Qq | vpaddw Vx,Hx,Wx (66),(v1) fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1) -ff: +ff: UD0 EndTable Table: 3-byte opcode 1 (0x0f 0x38) @@ -717,7 +717,7 @@ AVXcode: 2 7e: vpermt2d/q Vx,Hx,Wx (66),(ev) 7f: vpermt2ps/d Vx,Hx,Wx (66),(ev) 80: INVEPT Gy,Mdq (66) -81: INVPID Gy,Mdq (66) +81: INVVPID Gy,Mdq (66) 82: INVPCID Gy,Mdq (66) 83: vpmultishiftqb Vx,Hx,Wx (66),(ev) 88: vexpandps/d Vpd,Wpd (66),(ev) @@ -896,7 +896,7 @@ EndTable GrpTable: Grp3_1 0: TEST Eb,Ib -1: +1: TEST Eb,Ib 2: NOT Eb 3: NEG Eb 4: MUL AL,Eb @@ -970,6 +970,15 @@ GrpTable: Grp9 EndTable GrpTable: Grp10 +# all are UD1 +0: UD1 +1: UD1 +2: UD1 +3: UD1 +4: UD1 +5: UD1 +6: UD1 +7: UD1 EndTable # Grp11A and Grp11B are expressed as Grp11 in Intel SDM From adc37e209d954ffa235d036c1b71ca84691ae4f7 Mon Sep 17 00:00:00 2001 From: Ricardo Neri Date: Fri, 27 Oct 2017 13:25:40 -0700 Subject: [PATCH 0738/1013] x86/insn-eval: Add utility functions to get segment selector commit 32d0b95300db03c2b23b2ea2c94769a4a138e79d upstream. [note, only the inat.h portion, to get objtool back in sync - gregkh] b0caa8c8c6bbc422bc3c32b64852d6d618f32b49 Mon Sep 17 00:00:00 2001 When computing a linear address and segmentation is used, we need to know the base address of the segment involved in the computation. In most of the cases, the segment base address will be zero as in USER_DS/USER32_DS. However, it may be possible that a user space program defines its own segments via a local descriptor table. In such a case, the segment base address may not be zero. Thus, the segment base address is needed to calculate correctly the linear address. If running in protected mode, the segment selector to be used when computing a linear address is determined by either any of segment override prefixes in the instruction or inferred from the registers involved in the computation of the effective address; in that order. Also, there are cases when the segment override prefixes shall be ignored (i.e., code segments are always selected by the CS segment register; string instructions always use the ES segment register when using rDI register as operand). In long mode, segment registers are ignored, except for FS and GS. In these two cases, base addresses are obtained from the respective MSRs. For clarity, this process can be split into four steps (and an equal number of functions): determine if segment prefixes overrides can be used; parse the segment override prefixes, and use them if found; if not found or cannot be used, use the default segment registers associated with the operand registers. Once the segment register to use has been identified, read its value to obtain the segment selector. The method to obtain the segment selector depends on several factors. In 32-bit builds, segment selectors are saved into a pt_regs structure when switching to kernel mode. The same is also true for virtual-8086 mode. In 64-bit builds, segmentation is mostly ignored, except when running a program in 32-bit legacy mode. In this case, CS and SS can be obtained from pt_regs. DS, ES, FS and GS can be read directly from the respective segment registers. In order to identify the segment registers, a new set of #defines is introduced. It also includes two special identifiers. One of them indicates when the default segment register associated with instruction operands shall be used. Another one indicates that the contents of the segment register shall be ignored; this identifier is used when in long mode. Improvements-by: Borislav Petkov Signed-off-by: Ricardo Neri Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: "Michael S. Tsirkin" Cc: Peter Zijlstra Cc: Dave Hansen Cc: ricardo.neri@intel.com Cc: Adrian Hunter Cc: Paul Gortmaker Cc: Huang Rui Cc: Qiaowei Ren Cc: Shuah Khan Cc: Kees Cook Cc: Jonathan Corbet Cc: Jiri Slaby Cc: Dmitry Vyukov Cc: "Ravi V. Shankar" Cc: Chris Metcalf Cc: Brian Gerst Cc: Arnaldo Carvalho de Melo Cc: Andy Lutomirski Cc: Colin Ian King Cc: Chen Yucong Cc: Adam Buchbinder Cc: Vlastimil Babka Cc: Lorenzo Stoakes Cc: Masami Hiramatsu Cc: Paolo Bonzini Cc: Andrew Morton Cc: Thomas Garnier Link: https://lkml.kernel.org/r/1509135945-13762-14-git-send-email-ricardo.neri-calderon@linux.intel.com Cc: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/inat.h | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/include/asm/inat.h b/arch/x86/include/asm/inat.h index 02aff0867211..1c78580e58be 100644 --- a/arch/x86/include/asm/inat.h +++ b/arch/x86/include/asm/inat.h @@ -97,6 +97,16 @@ #define INAT_MAKE_GROUP(grp) ((grp << INAT_GRP_OFFS) | INAT_MODRM) #define INAT_MAKE_IMM(imm) (imm << INAT_IMM_OFFS) +/* Identifiers for segment registers */ +#define INAT_SEG_REG_IGNORE 0 +#define INAT_SEG_REG_DEFAULT 1 +#define INAT_SEG_REG_CS 2 +#define INAT_SEG_REG_SS 3 +#define INAT_SEG_REG_DS 4 +#define INAT_SEG_REG_ES 5 +#define INAT_SEG_REG_FS 6 +#define INAT_SEG_REG_GS 7 + /* Attribute search APIs */ extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode); extern int inat_get_last_prefix_id(insn_byte_t last_pfx); From 662fd946aa4637a822bff0ec02afd59a8e2c2ba2 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 20 Dec 2017 18:02:34 +0100 Subject: [PATCH 0739/1013] x86/Kconfig: Limit NR_CPUS on 32-bit to a sane amount commit 7bbcbd3d1cdcbacd0f9f8dc4c98d550972f1ca30 upstream. The recent cpu_entry_area changes fail to compile on 32-bit when BIGSMP=y and NR_CPUS=512, because the fixmap area becomes too big. Limit the number of CPUs with BIGSMP to 64, which is already way to big for 32-bit, but it's at least a working limitation. We performed a quick survey of 32-bit-only machines that might be affected by this change negatively, but found none. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 48646160eb83..592c974d4558 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -925,7 +925,8 @@ config MAXSMP config NR_CPUS int "Maximum number of CPUs" if SMP && !MAXSMP range 2 8 if SMP && X86_32 && !X86_BIGSMP - range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK + range 2 64 if SMP && X86_32 && X86_BIGSMP + range 2 512 if SMP && !MAXSMP && !CPUMASK_OFFSTACK && X86_64 range 2 8192 if SMP && !MAXSMP && CPUMASK_OFFSTACK && X86_64 default "1" if !SMP default "8192" if MAXSMP From c4bc398080d80008e7d4a8cff528b8f3e8d8fb63 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sat, 16 Dec 2017 01:14:39 +0100 Subject: [PATCH 0740/1013] x86/mm/dump_pagetables: Check PAGE_PRESENT for real commit c05344947b37f7cda726e802457370bc6eac4d26 upstream. The check for a present page in printk_prot(): if (!pgprot_val(prot)) { /* Not present */ is bogus. If a PTE is set to PAGE_NONE then the pgprot_val is not zero and the entry is decoded in bogus ways, e.g. as RX GLB. That is confusing when analyzing mapping correctness. Check for the present bit to make an informed decision. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/dump_pagetables.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 5e3ac6fe6c9e..1014cfb21c2c 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -140,7 +140,7 @@ static void printk_prot(struct seq_file *m, pgprot_t prot, int level, bool dmsg) static const char * const level_name[] = { "cr3", "pgd", "p4d", "pud", "pmd", "pte" }; - if (!pgprot_val(prot)) { + if (!(pr & _PAGE_PRESENT)) { /* Not present */ pt_dump_cont_printf(m, dmsg, " "); } else { From 7b45ad6e0fd7710cb5a5fb9a6bdd0632f510bf50 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 20 Dec 2017 18:07:42 +0100 Subject: [PATCH 0741/1013] x86/mm/dump_pagetables: Make the address hints correct and readable commit 146122e24bdf208015d629babba673e28d090709 upstream. The address hints are a trainwreck. The array entry numbers have to kept magically in sync with the actual hints, which is doomed as some of the array members are initialized at runtime via the entry numbers. Designated initializers have been around before this code was implemented.... Use the entry numbers to populate the address hints array and add the missing bits and pieces. Split 32 and 64 bit for readability sake. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/dump_pagetables.c | 90 +++++++++++++++++++++-------------- 1 file changed, 53 insertions(+), 37 deletions(-) diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 1014cfb21c2c..fdf09d8f98da 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -44,10 +44,12 @@ struct addr_marker { unsigned long max_lines; }; -/* indices for address_markers; keep sync'd w/ address_markers below */ +/* Address space markers hints */ + +#ifdef CONFIG_X86_64 + enum address_markers_idx { USER_SPACE_NR = 0, -#ifdef CONFIG_X86_64 KERNEL_SPACE_NR, LOW_KERNEL_NR, VMALLOC_START_NR, @@ -56,56 +58,70 @@ enum address_markers_idx { KASAN_SHADOW_START_NR, KASAN_SHADOW_END_NR, #endif -# ifdef CONFIG_X86_ESPFIX64 +#ifdef CONFIG_X86_ESPFIX64 ESPFIX_START_NR, -# endif +#endif +#ifdef CONFIG_EFI + EFI_END_NR, +#endif HIGH_KERNEL_NR, MODULES_VADDR_NR, MODULES_END_NR, -#else + FIXADDR_START_NR, + END_OF_SPACE_NR, +}; + +static struct addr_marker address_markers[] = { + [USER_SPACE_NR] = { 0, "User Space" }, + [KERNEL_SPACE_NR] = { (1UL << 63), "Kernel Space" }, + [LOW_KERNEL_NR] = { 0UL, "Low Kernel Mapping" }, + [VMALLOC_START_NR] = { 0UL, "vmalloc() Area" }, + [VMEMMAP_START_NR] = { 0UL, "Vmemmap" }, +#ifdef CONFIG_KASAN + [KASAN_SHADOW_START_NR] = { KASAN_SHADOW_START, "KASAN shadow" }, + [KASAN_SHADOW_END_NR] = { KASAN_SHADOW_END, "KASAN shadow end" }, +#endif +#ifdef CONFIG_X86_ESPFIX64 + [ESPFIX_START_NR] = { ESPFIX_BASE_ADDR, "ESPfix Area", 16 }, +#endif +#ifdef CONFIG_EFI + [EFI_END_NR] = { EFI_VA_END, "EFI Runtime Services" }, +#endif + [HIGH_KERNEL_NR] = { __START_KERNEL_map, "High Kernel Mapping" }, + [MODULES_VADDR_NR] = { MODULES_VADDR, "Modules" }, + [MODULES_END_NR] = { MODULES_END, "End Modules" }, + [FIXADDR_START_NR] = { FIXADDR_START, "Fixmap Area" }, + [END_OF_SPACE_NR] = { -1, NULL } +}; + +#else /* CONFIG_X86_64 */ + +enum address_markers_idx { + USER_SPACE_NR = 0, KERNEL_SPACE_NR, VMALLOC_START_NR, VMALLOC_END_NR, -# ifdef CONFIG_HIGHMEM +#ifdef CONFIG_HIGHMEM PKMAP_BASE_NR, -# endif - FIXADDR_START_NR, #endif + FIXADDR_START_NR, + END_OF_SPACE_NR, }; -/* Address space markers hints */ static struct addr_marker address_markers[] = { - { 0, "User Space" }, -#ifdef CONFIG_X86_64 - { 0x8000000000000000UL, "Kernel Space" }, - { 0/* PAGE_OFFSET */, "Low Kernel Mapping" }, - { 0/* VMALLOC_START */, "vmalloc() Area" }, - { 0/* VMEMMAP_START */, "Vmemmap" }, -#ifdef CONFIG_KASAN - { KASAN_SHADOW_START, "KASAN shadow" }, - { KASAN_SHADOW_END, "KASAN shadow end" }, + [USER_SPACE_NR] = { 0, "User Space" }, + [KERNEL_SPACE_NR] = { PAGE_OFFSET, "Kernel Mapping" }, + [VMALLOC_START_NR] = { 0UL, "vmalloc() Area" }, + [VMALLOC_END_NR] = { 0UL, "vmalloc() End" }, +#ifdef CONFIG_HIGHMEM + [PKMAP_BASE_NR] = { 0UL, "Persistent kmap() Area" }, #endif -# ifdef CONFIG_X86_ESPFIX64 - { ESPFIX_BASE_ADDR, "ESPfix Area", 16 }, -# endif -# ifdef CONFIG_EFI - { EFI_VA_END, "EFI Runtime Services" }, -# endif - { __START_KERNEL_map, "High Kernel Mapping" }, - { MODULES_VADDR, "Modules" }, - { MODULES_END, "End Modules" }, -#else - { PAGE_OFFSET, "Kernel Mapping" }, - { 0/* VMALLOC_START */, "vmalloc() Area" }, - { 0/*VMALLOC_END*/, "vmalloc() End" }, -# ifdef CONFIG_HIGHMEM - { 0/*PKMAP_BASE*/, "Persistent kmap() Area" }, -# endif - { 0/*FIXADDR_START*/, "Fixmap Area" }, -#endif - { -1, NULL } /* End of list */ + [FIXADDR_START_NR] = { 0UL, "Fixmap area" }, + [END_OF_SPACE_NR] = { -1, NULL } }; +#endif /* !CONFIG_X86_64 */ + /* Multipliers for offsets within the PTEs */ #define PTE_LEVEL_MULT (PAGE_SIZE) #define PMD_LEVEL_MULT (PTRS_PER_PTE * PTE_LEVEL_MULT) From beb899c4bcdfb006293a593ffe31c9f13ec7c66b Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Sun, 10 Dec 2017 22:47:19 -0800 Subject: [PATCH 0742/1013] x86/vsyscall/64: Explicitly set _PAGE_USER in the pagetable hierarchy commit 49275fef986abfb8b476e4708aaecc07e7d3e087 upstream. The kernel is very erratic as to which pagetables have _PAGE_USER set. The vsyscall page gets lucky: it seems that all of the relevant pagetables are among the apparently arbitrary ones that set _PAGE_USER. Rather than relying on chance, just explicitly set _PAGE_USER. This will let us clean up pagetable setup to stop setting _PAGE_USER. The added code can also be reused by pagetable isolation to manage the _PAGE_USER bit in the usermode tables. [ tglx: Folded paravirt fix from Juergen Gross ] Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/vsyscall/vsyscall_64.c | 34 ++++++++++++++++++++++++++- 1 file changed, 33 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index f279ba2643dc..daad57c76e42 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -37,6 +37,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include "vsyscall_trace.h" @@ -329,16 +330,47 @@ int in_gate_area_no_mm(unsigned long addr) return vsyscall_mode != NONE && (addr & PAGE_MASK) == VSYSCALL_ADDR; } +/* + * The VSYSCALL page is the only user-accessible page in the kernel address + * range. Normally, the kernel page tables can have _PAGE_USER clear, but + * the tables covering VSYSCALL_ADDR need _PAGE_USER set if vsyscalls + * are enabled. + * + * Some day we may create a "minimal" vsyscall mode in which we emulate + * vsyscalls but leave the page not present. If so, we skip calling + * this. + */ +static void __init set_vsyscall_pgtable_user_bits(void) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + + pgd = pgd_offset_k(VSYSCALL_ADDR); + set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER)); + p4d = p4d_offset(pgd, VSYSCALL_ADDR); +#if CONFIG_PGTABLE_LEVELS >= 5 + p4d->p4d |= _PAGE_USER; +#endif + pud = pud_offset(p4d, VSYSCALL_ADDR); + set_pud(pud, __pud(pud_val(*pud) | _PAGE_USER)); + pmd = pmd_offset(pud, VSYSCALL_ADDR); + set_pmd(pmd, __pmd(pmd_val(*pmd) | _PAGE_USER)); +} + void __init map_vsyscall(void) { extern char __vsyscall_page; unsigned long physaddr_vsyscall = __pa_symbol(&__vsyscall_page); - if (vsyscall_mode != NONE) + if (vsyscall_mode != NONE) { __set_fixmap(VSYSCALL_PAGE, physaddr_vsyscall, vsyscall_mode == NATIVE ? PAGE_KERNEL_VSYSCALL : PAGE_KERNEL_VVAR); + set_vsyscall_pgtable_user_bits(); + } BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) != (unsigned long)VSYSCALL_ADDR); From 49c01662d3177e1814ceb52337add473fb4d264b Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Sun, 10 Dec 2017 22:47:20 -0800 Subject: [PATCH 0743/1013] x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode commit 4831b779403a836158917d59a7ca880483c67378 upstream. If something goes wrong with pagetable setup, vsyscall=native will accidentally fall back to emulation. Make it warn and fail so that we notice. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/vsyscall/vsyscall_64.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index daad57c76e42..1faf40f2dda9 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -139,6 +139,10 @@ bool emulate_vsyscall(struct pt_regs *regs, unsigned long address) WARN_ON_ONCE(address != regs->ip); + /* This should be unreachable in NATIVE mode. */ + if (WARN_ON(vsyscall_mode == NATIVE)) + return false; + if (vsyscall_mode == NONE) { warn_bad_vsyscall(KERN_INFO, regs, "vsyscall attempted with vsyscall=none"); From ee8e8b2df6d3f2a4ce25182d18964f891760ac65 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 14 Dec 2017 12:27:29 +0100 Subject: [PATCH 0744/1013] arch, mm: Allow arch_dup_mmap() to fail commit c10e83f598d08046dd1ebc8360d4bb12d802d51b upstream. In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be allowed to fail. Fix up all instances. Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Cc: Andy Lutomirski Cc: Andy Lutomirsky Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: dan.j.williams@intel.com Cc: hughd@google.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/include/asm/mmu_context.h | 5 +++-- arch/um/include/asm/mmu_context.h | 3 ++- arch/unicore32/include/asm/mmu_context.h | 5 +++-- arch/x86/include/asm/mmu_context.h | 4 ++-- include/asm-generic/mm_hooks.h | 5 +++-- kernel/fork.c | 3 +-- 6 files changed, 14 insertions(+), 11 deletions(-) diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h index 492d8140a395..44fdf4786638 100644 --- a/arch/powerpc/include/asm/mmu_context.h +++ b/arch/powerpc/include/asm/mmu_context.h @@ -114,9 +114,10 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, #endif } -static inline void arch_dup_mmap(struct mm_struct *oldmm, - struct mm_struct *mm) +static inline int arch_dup_mmap(struct mm_struct *oldmm, + struct mm_struct *mm) { + return 0; } static inline void arch_exit_mmap(struct mm_struct *mm) diff --git a/arch/um/include/asm/mmu_context.h b/arch/um/include/asm/mmu_context.h index b668e351fd6c..fca34b2177e2 100644 --- a/arch/um/include/asm/mmu_context.h +++ b/arch/um/include/asm/mmu_context.h @@ -15,9 +15,10 @@ extern void uml_setup_stubs(struct mm_struct *mm); /* * Needed since we do not use the asm-generic/mm_hooks.h: */ -static inline void arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) +static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) { uml_setup_stubs(mm); + return 0; } extern void arch_exit_mmap(struct mm_struct *mm); static inline void arch_unmap(struct mm_struct *mm, diff --git a/arch/unicore32/include/asm/mmu_context.h b/arch/unicore32/include/asm/mmu_context.h index 59b06b48f27d..5c205a9cb5a6 100644 --- a/arch/unicore32/include/asm/mmu_context.h +++ b/arch/unicore32/include/asm/mmu_context.h @@ -81,9 +81,10 @@ do { \ } \ } while (0) -static inline void arch_dup_mmap(struct mm_struct *oldmm, - struct mm_struct *mm) +static inline int arch_dup_mmap(struct mm_struct *oldmm, + struct mm_struct *mm) { + return 0; } static inline void arch_unmap(struct mm_struct *mm, diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 6d16d15d09a0..c76162439c8a 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -176,10 +176,10 @@ do { \ } while (0) #endif -static inline void arch_dup_mmap(struct mm_struct *oldmm, - struct mm_struct *mm) +static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) { paravirt_arch_dup_mmap(oldmm, mm); + return 0; } static inline void arch_exit_mmap(struct mm_struct *mm) diff --git a/include/asm-generic/mm_hooks.h b/include/asm-generic/mm_hooks.h index ea189d88a3cc..8ac4e68a12f0 100644 --- a/include/asm-generic/mm_hooks.h +++ b/include/asm-generic/mm_hooks.h @@ -7,9 +7,10 @@ #ifndef _ASM_GENERIC_MM_HOOKS_H #define _ASM_GENERIC_MM_HOOKS_H -static inline void arch_dup_mmap(struct mm_struct *oldmm, - struct mm_struct *mm) +static inline int arch_dup_mmap(struct mm_struct *oldmm, + struct mm_struct *mm) { + return 0; } static inline void arch_exit_mmap(struct mm_struct *mm) diff --git a/kernel/fork.c b/kernel/fork.c index 07cc743698d3..500ce64517d9 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -721,8 +721,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm, goto out; } /* a new mm has just been created */ - arch_dup_mmap(oldmm, mm); - retval = 0; + retval = arch_dup_mmap(oldmm, mm); out: up_write(&mm->mmap_sem); flush_tlb_mm(oldmm); From b17459342c55a209014a783d74bff6a0a5374339 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 14 Dec 2017 12:27:30 +0100 Subject: [PATCH 0745/1013] x86/ldt: Rework locking commit c2b3496bb30bd159e9de42e5c952e1f1f33c9a77 upstream. The LDT is duplicated on fork() and on exec(), which is wrong as exec() should start from a clean state, i.e. without LDT. To fix this the LDT duplication code will be moved into arch_dup_mmap() which is only called for fork(). This introduces a locking problem. arch_dup_mmap() holds mmap_sem of the parent process, but the LDT duplication code needs to acquire mm->context.lock to access the LDT data safely, which is the reverse lock order of write_ldt() where mmap_sem nests into context.lock. Solve this by introducing a new rw semaphore which serializes the read/write_ldt() syscall operations and use context.lock to protect the actual installment of the LDT descriptor. So context.lock stabilizes mm->context.ldt and can nest inside of the new semaphore or mmap_sem. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Andy Lutomirsky Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: dan.j.williams@intel.com Cc: hughd@google.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/mmu.h | 4 +++- arch/x86/include/asm/mmu_context.h | 2 ++ arch/x86/kernel/ldt.c | 33 +++++++++++++++++++----------- 3 files changed, 26 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h index 9ea26f167497..5ff3e8af2c20 100644 --- a/arch/x86/include/asm/mmu.h +++ b/arch/x86/include/asm/mmu.h @@ -3,6 +3,7 @@ #define _ASM_X86_MMU_H #include +#include #include #include @@ -27,7 +28,8 @@ typedef struct { atomic64_t tlb_gen; #ifdef CONFIG_MODIFY_LDT_SYSCALL - struct ldt_struct *ldt; + struct rw_semaphore ldt_usr_sem; + struct ldt_struct *ldt; #endif #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index c76162439c8a..4fdbe5efe535 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -132,6 +132,8 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk); static inline int init_new_context(struct task_struct *tsk, struct mm_struct *mm) { + mutex_init(&mm->context.lock); + mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id); atomic64_set(&mm->context.tlb_gen, 0); diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 1c1eae961340..1600aebc1ec7 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -5,6 +5,11 @@ * Copyright (C) 2002 Andi Kleen * * This handles calls from both 32bit and 64bit mode. + * + * Lock order: + * contex.ldt_usr_sem + * mmap_sem + * context.lock */ #include @@ -42,7 +47,7 @@ static void refresh_ldt_segments(void) #endif } -/* context.lock is held for us, so we don't need any locking. */ +/* context.lock is held by the task which issued the smp function call */ static void flush_ldt(void *__mm) { struct mm_struct *mm = __mm; @@ -99,15 +104,17 @@ static void finalize_ldt_struct(struct ldt_struct *ldt) paravirt_alloc_ldt(ldt->entries, ldt->nr_entries); } -/* context.lock is held */ -static void install_ldt(struct mm_struct *current_mm, - struct ldt_struct *ldt) +static void install_ldt(struct mm_struct *mm, struct ldt_struct *ldt) { + mutex_lock(&mm->context.lock); + /* Synchronizes with READ_ONCE in load_mm_ldt. */ - smp_store_release(¤t_mm->context.ldt, ldt); + smp_store_release(&mm->context.ldt, ldt); - /* Activate the LDT for all CPUs using current_mm. */ - on_each_cpu_mask(mm_cpumask(current_mm), flush_ldt, current_mm, true); + /* Activate the LDT for all CPUs using currents mm. */ + on_each_cpu_mask(mm_cpumask(mm), flush_ldt, mm, true); + + mutex_unlock(&mm->context.lock); } static void free_ldt_struct(struct ldt_struct *ldt) @@ -133,7 +140,8 @@ int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm) struct mm_struct *old_mm; int retval = 0; - mutex_init(&mm->context.lock); + init_rwsem(&mm->context.ldt_usr_sem); + old_mm = current->mm; if (!old_mm) { mm->context.ldt = NULL; @@ -180,7 +188,7 @@ static int read_ldt(void __user *ptr, unsigned long bytecount) unsigned long entries_size; int retval; - mutex_lock(&mm->context.lock); + down_read(&mm->context.ldt_usr_sem); if (!mm->context.ldt) { retval = 0; @@ -209,7 +217,7 @@ static int read_ldt(void __user *ptr, unsigned long bytecount) retval = bytecount; out_unlock: - mutex_unlock(&mm->context.lock); + up_read(&mm->context.ldt_usr_sem); return retval; } @@ -269,7 +277,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) ldt.avl = 0; } - mutex_lock(&mm->context.lock); + if (down_write_killable(&mm->context.ldt_usr_sem)) + return -EINTR; old_ldt = mm->context.ldt; old_nr_entries = old_ldt ? old_ldt->nr_entries : 0; @@ -291,7 +300,7 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) error = 0; out_unlock: - mutex_unlock(&mm->context.lock); + up_write(&mm->context.ldt_usr_sem); out: return error; } From 2c8e9099aecec2baaac8d34c7b823493f2d0eeed Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 14 Dec 2017 12:27:31 +0100 Subject: [PATCH 0746/1013] x86/ldt: Prevent LDT inheritance on exec commit a4828f81037f491b2cc986595e3a969a6eeb2fb5 upstream. The LDT is inherited across fork() or exec(), but that makes no sense at all because exec() is supposed to start the process clean. The reason why this happens is that init_new_context_ldt() is called from init_new_context() which obviously needs to be called for both fork() and exec(). It would be surprising if anything relies on that behaviour, so it seems to be safe to remove that misfeature. Split the context initialization into two parts. Clear the LDT pointer and initialize the mutex from the general context init and move the LDT duplication to arch_dup_mmap() which is only called on fork(). Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra Cc: Andy Lutomirski Cc: Andy Lutomirsky Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Will Deacon Cc: aliguori@amazon.com Cc: dan.j.williams@intel.com Cc: hughd@google.com Cc: keescook@google.com Cc: kirill.shutemov@linux.intel.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/mmu_context.h | 21 ++++++++++++++------- arch/x86/kernel/ldt.c | 18 +++++------------- tools/testing/selftests/x86/ldt_gdt.c | 9 +++------ 3 files changed, 22 insertions(+), 26 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 4fdbe5efe535..5e25423bf9bb 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -57,11 +57,17 @@ struct ldt_struct { /* * Used for LDT copy/destruction. */ -int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm); +static inline void init_new_context_ldt(struct mm_struct *mm) +{ + mm->context.ldt = NULL; + init_rwsem(&mm->context.ldt_usr_sem); +} +int ldt_dup_context(struct mm_struct *oldmm, struct mm_struct *mm); void destroy_context_ldt(struct mm_struct *mm); #else /* CONFIG_MODIFY_LDT_SYSCALL */ -static inline int init_new_context_ldt(struct task_struct *tsk, - struct mm_struct *mm) +static inline void init_new_context_ldt(struct mm_struct *mm) { } +static inline int ldt_dup_context(struct mm_struct *oldmm, + struct mm_struct *mm) { return 0; } @@ -137,15 +143,16 @@ static inline int init_new_context(struct task_struct *tsk, mm->context.ctx_id = atomic64_inc_return(&last_mm_ctx_id); atomic64_set(&mm->context.tlb_gen, 0); - #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS +#ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS if (cpu_feature_enabled(X86_FEATURE_OSPKE)) { /* pkey 0 is the default and always allocated */ mm->context.pkey_allocation_map = 0x1; /* -1 means unallocated or invalid */ mm->context.execute_only_pkey = -1; } - #endif - return init_new_context_ldt(tsk, mm); +#endif + init_new_context_ldt(mm); + return 0; } static inline void destroy_context(struct mm_struct *mm) { @@ -181,7 +188,7 @@ do { \ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) { paravirt_arch_dup_mmap(oldmm, mm); - return 0; + return ldt_dup_context(oldmm, mm); } static inline void arch_exit_mmap(struct mm_struct *mm) diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 1600aebc1ec7..a6b5d62f45a7 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -131,28 +131,20 @@ static void free_ldt_struct(struct ldt_struct *ldt) } /* - * we do not have to muck with descriptors here, that is - * done in switch_mm() as needed. + * Called on fork from arch_dup_mmap(). Just copy the current LDT state, + * the new task is not running, so nothing can be installed. */ -int init_new_context_ldt(struct task_struct *tsk, struct mm_struct *mm) +int ldt_dup_context(struct mm_struct *old_mm, struct mm_struct *mm) { struct ldt_struct *new_ldt; - struct mm_struct *old_mm; int retval = 0; - init_rwsem(&mm->context.ldt_usr_sem); - - old_mm = current->mm; - if (!old_mm) { - mm->context.ldt = NULL; + if (!old_mm) return 0; - } mutex_lock(&old_mm->context.lock); - if (!old_mm->context.ldt) { - mm->context.ldt = NULL; + if (!old_mm->context.ldt) goto out_unlock; - } new_ldt = alloc_ldt_struct(old_mm->context.ldt->nr_entries); if (!new_ldt) { diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c index 66e5ce5b91f0..0304ffb714f2 100644 --- a/tools/testing/selftests/x86/ldt_gdt.c +++ b/tools/testing/selftests/x86/ldt_gdt.c @@ -627,13 +627,10 @@ static void do_multicpu_tests(void) static int finish_exec_test(void) { /* - * In a sensible world, this would be check_invalid_segment(0, 1); - * For better or for worse, though, the LDT is inherited across exec. - * We can probably change this safely, but for now we test it. + * Older kernel versions did inherit the LDT on exec() which is + * wrong because exec() starts from a clean state. */ - check_valid_segment(0, 1, - AR_DPL3 | AR_TYPE_XRCODE | AR_S | AR_P | AR_DB, - 42, true); + check_invalid_segment(0, 1); return nerrs ? 1 : 0; } From 88569f5e3a5648331066b49dd6918b8c70067ae7 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 12 Dec 2017 07:56:43 -0800 Subject: [PATCH 0747/1013] x86/mm/64: Improve the memory map documentation commit 5a7ccf4754fb3660569a6de52ba7f7fc3dfaf280 upstream. The old docs had the vsyscall range wrong and were missing the fixmap. Fix both. There used to be 8 MB reserved for future vsyscalls, but that's long gone. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 3448e675b462..83ca5a3b90ac 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -19,8 +19,9 @@ ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space ... unused hole ... ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 -ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space (variable) -ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls +ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space (variable) +[fixmap start] - ffffffffff5fffff kernel-internal fixmap range +ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole Virtual memory map with 5 level page tables: @@ -41,8 +42,9 @@ ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space ... unused hole ... ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 -ffffffffa0000000 - ffffffffff5fffff (=1526 MB) module mapping space -ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls +ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space +[fixmap start] - ffffffffff5fffff kernel-internal fixmap range +ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole Architecture defines a 64-bit virtual address. Implementations can support From d8f29ac7363778f38dab7bcb79ca076b8534904c Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Dec 2017 13:34:54 +0100 Subject: [PATCH 0748/1013] x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation commit e8ffe96e5933d417195268478479933d56213a3f upstream. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 83ca5a3b90ac..63a41671d25b 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -1,6 +1,4 @@ - - Virtual memory map with 4 level page tables: 0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm @@ -49,8 +47,9 @@ ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole Architecture defines a 64-bit virtual address. Implementations can support less. Currently supported are 48- and 57-bit virtual addresses. Bits 63 -through to the most-significant implemented bit are set to either all ones -or all zero. This causes hole between user space and kernel addresses. +through to the most-significant implemented bit are sign extended. +This causes hole between user space and kernel addresses if you interpret them +as unsigned. The direct mapping covers all memory in the system up to the highest memory address (this means in some cases it can also include PCI memory @@ -60,9 +59,6 @@ vmalloc space is lazily synchronized into the different PML4/PML5 pages of the processes using the page fault handler, with init_top_pgt as reference. -Current X86-64 implementations support up to 46 bits of address space (64 TB), -which is our current limit. This expands into MBZ space in the page tables. - We map EFI runtime services in the 'efi_pgd' PGD in a 64Gb large virtual memory window (this size is arbitrary, it can be raised later if needed). The mappings are not part of any other kernel PGD and are only available @@ -74,5 +70,3 @@ following fixmap section. Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all physical memory, vmalloc/ioremap space and virtual memory map are randomized. Their order is preserved but their base will be offset early at boot time. - --Andi Kleen, Jul 2004 From 06f9acfe0abcc9386bbeee8438b87f90ff2e04d1 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 17:25:07 -0800 Subject: [PATCH 0749/1013] x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack commit 4fe2d8b11a370af286287a2661de9d4e6c9a145a upstream. If the kernel oopses while on the trampoline stack, it will print "" even if SYSENTER is not involved. That is rather confusing. The "SYSENTER" stack is used for a lot more than SYSENTER now. Give it a better string to display in stack dumps, and rename the kernel code to match. Also move the 32-bit code over to the new naming even though it still uses the entry stack only for SYSENTER. Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_32.S | 12 ++++++------ arch/x86/entry/entry_64.S | 4 ++-- arch/x86/include/asm/fixmap.h | 8 ++++---- arch/x86/include/asm/processor.h | 6 +++--- arch/x86/include/asm/stacktrace.h | 4 ++-- arch/x86/kernel/asm-offsets.c | 4 ++-- arch/x86/kernel/asm-offsets_32.c | 2 +- arch/x86/kernel/cpu/common.c | 14 +++++++------- arch/x86/kernel/dumpstack.c | 10 +++++----- arch/x86/kernel/dumpstack_32.c | 6 +++--- arch/x86/kernel/dumpstack_64.c | 12 +++++++++--- 11 files changed, 44 insertions(+), 38 deletions(-) diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index bd8b57a5c874..ace8f321a5a1 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -942,9 +942,9 @@ ENTRY(debug) /* Are we currently on the SYSENTER stack? */ movl PER_CPU_VAR(cpu_entry_area), %ecx - addl $CPU_ENTRY_AREA_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx - subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ - cmpl $SIZEOF_SYSENTER_stack, %ecx + addl $CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx + subl %eax, %ecx /* ecx = (end of entry_stack) - esp */ + cmpl $SIZEOF_entry_stack, %ecx jb .Ldebug_from_sysenter_stack TRACE_IRQS_OFF @@ -986,9 +986,9 @@ ENTRY(nmi) /* Are we currently on the SYSENTER stack? */ movl PER_CPU_VAR(cpu_entry_area), %ecx - addl $CPU_ENTRY_AREA_SYSENTER_stack + SIZEOF_SYSENTER_stack, %ecx - subl %eax, %ecx /* ecx = (end of SYSENTER_stack) - esp */ - cmpl $SIZEOF_SYSENTER_stack, %ecx + addl $CPU_ENTRY_AREA_entry_stack + SIZEOF_entry_stack, %ecx + subl %eax, %ecx /* ecx = (end of entry_stack) - esp */ + cmpl $SIZEOF_entry_stack, %ecx jb .Lnmi_from_sysenter_stack /* Not on SYSENTER stack. */ diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 6abe3fcaece9..22c891c3b78d 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -154,8 +154,8 @@ END(native_usergs_sysret64) _entry_trampoline - CPU_ENTRY_AREA_entry_trampoline(%rip) /* The top word of the SYSENTER stack is hot and is usable as scratch space. */ -#define RSP_SCRATCH CPU_ENTRY_AREA_SYSENTER_stack + \ - SIZEOF_SYSENTER_stack - 8 + CPU_ENTRY_AREA +#define RSP_SCRATCH CPU_ENTRY_AREA_entry_stack + \ + SIZEOF_entry_stack - 8 + CPU_ENTRY_AREA ENTRY(entry_SYSCALL_64_trampoline) UNWIND_HINT_EMPTY diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 94fc4fa14127..8153b8d86a3c 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -56,10 +56,10 @@ struct cpu_entry_area { char gdt[PAGE_SIZE]; /* - * The GDT is just below SYSENTER_stack and thus serves (on x86_64) as + * The GDT is just below entry_stack and thus serves (on x86_64) as * a a read-only guard page. */ - struct SYSENTER_stack_page SYSENTER_stack_page; + struct entry_stack_page entry_stack_page; /* * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because @@ -250,9 +250,9 @@ static inline struct cpu_entry_area *get_cpu_entry_area(int cpu) return (struct cpu_entry_area *)__fix_to_virt(__get_cpu_entry_area_page_index(cpu, 0)); } -static inline struct SYSENTER_stack *cpu_SYSENTER_stack(int cpu) +static inline struct entry_stack *cpu_entry_stack(int cpu) { - return &get_cpu_entry_area(cpu)->SYSENTER_stack_page.stack; + return &get_cpu_entry_area(cpu)->entry_stack_page.stack; } #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index da943411d3d8..9e482d8b0b97 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -336,12 +336,12 @@ struct x86_hw_tss { #define IO_BITMAP_OFFSET (offsetof(struct tss_struct, io_bitmap) - offsetof(struct tss_struct, x86_tss)) #define INVALID_IO_BITMAP_OFFSET 0x8000 -struct SYSENTER_stack { +struct entry_stack { unsigned long words[64]; }; -struct SYSENTER_stack_page { - struct SYSENTER_stack stack; +struct entry_stack_page { + struct entry_stack stack; } __aligned(PAGE_SIZE); struct tss_struct { diff --git a/arch/x86/include/asm/stacktrace.h b/arch/x86/include/asm/stacktrace.h index f8062bfd43a0..f73706878772 100644 --- a/arch/x86/include/asm/stacktrace.h +++ b/arch/x86/include/asm/stacktrace.h @@ -16,7 +16,7 @@ enum stack_type { STACK_TYPE_TASK, STACK_TYPE_IRQ, STACK_TYPE_SOFTIRQ, - STACK_TYPE_SYSENTER, + STACK_TYPE_ENTRY, STACK_TYPE_EXCEPTION, STACK_TYPE_EXCEPTION_LAST = STACK_TYPE_EXCEPTION + N_EXCEPTION_STACKS-1, }; @@ -29,7 +29,7 @@ struct stack_info { bool in_task_stack(unsigned long *stack, struct task_struct *task, struct stack_info *info); -bool in_sysenter_stack(unsigned long *stack, struct stack_info *info); +bool in_entry_stack(unsigned long *stack, struct stack_info *info); int get_stack_info(unsigned long *stack, struct task_struct *task, struct stack_info *info, unsigned long *visit_mask); diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index cd360a5e0dca..676b7cf4b62b 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -97,6 +97,6 @@ void common(void) { /* Layout info for cpu_entry_area */ OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss); OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline); - OFFSET(CPU_ENTRY_AREA_SYSENTER_stack, cpu_entry_area, SYSENTER_stack_page); - DEFINE(SIZEOF_SYSENTER_stack, sizeof(struct SYSENTER_stack)); + OFFSET(CPU_ENTRY_AREA_entry_stack, cpu_entry_area, entry_stack_page); + DEFINE(SIZEOF_entry_stack, sizeof(struct entry_stack)); } diff --git a/arch/x86/kernel/asm-offsets_32.c b/arch/x86/kernel/asm-offsets_32.c index 7d20d9c0b3d6..fa1261eefa16 100644 --- a/arch/x86/kernel/asm-offsets_32.c +++ b/arch/x86/kernel/asm-offsets_32.c @@ -48,7 +48,7 @@ void foo(void) /* Offset from the sysenter stack to tss.sp0 */ DEFINE(TSS_sysenter_sp0, offsetof(struct cpu_entry_area, tss.x86_tss.sp0) - - offsetofend(struct cpu_entry_area, SYSENTER_stack_page.stack)); + offsetofend(struct cpu_entry_area, entry_stack_page.stack)); #ifdef CONFIG_CC_STACKPROTECTOR BLANK(); diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 034900623adf..ed4acbce37a8 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -487,8 +487,8 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); #endif -static DEFINE_PER_CPU_PAGE_ALIGNED(struct SYSENTER_stack_page, - SYSENTER_stack_storage); +static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, + entry_stack_storage); static void __init set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot) @@ -523,8 +523,8 @@ static void __init setup_cpu_entry_area(int cpu) #endif __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot); - set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, SYSENTER_stack_page), - per_cpu_ptr(&SYSENTER_stack_storage, cpu), 1, + set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, entry_stack_page), + per_cpu_ptr(&entry_stack_storage, cpu), 1, PAGE_KERNEL); /* @@ -1323,7 +1323,7 @@ void enable_sep_cpu(void) tss->x86_tss.ss1 = __KERNEL_CS; wrmsr(MSR_IA32_SYSENTER_CS, tss->x86_tss.ss1, 0); - wrmsr(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_SYSENTER_stack(cpu) + 1), 0); + wrmsr(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1), 0); wrmsr(MSR_IA32_SYSENTER_EIP, (unsigned long)entry_SYSENTER_32, 0); put_cpu(); @@ -1440,7 +1440,7 @@ void syscall_init(void) * AMD doesn't allow SYSENTER in long mode (either 32- or 64-bit). */ wrmsrl_safe(MSR_IA32_SYSENTER_CS, (u64)__KERNEL_CS); - wrmsrl_safe(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_SYSENTER_stack(cpu) + 1)); + wrmsrl_safe(MSR_IA32_SYSENTER_ESP, (unsigned long)(cpu_entry_stack(cpu) + 1)); wrmsrl_safe(MSR_IA32_SYSENTER_EIP, (u64)entry_SYSENTER_compat); #else wrmsrl(MSR_CSTAR, (unsigned long)ignore_sysret); @@ -1655,7 +1655,7 @@ void cpu_init(void) */ set_tss_desc(cpu, &get_cpu_entry_area(cpu)->tss.x86_tss); load_TR_desc(); - load_sp0((unsigned long)(cpu_SYSENTER_stack(cpu) + 1)); + load_sp0((unsigned long)(cpu_entry_stack(cpu) + 1)); load_mm_ldt(&init_mm); diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index bbd6d986e2d0..1dd3f533d78c 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -43,9 +43,9 @@ bool in_task_stack(unsigned long *stack, struct task_struct *task, return true; } -bool in_sysenter_stack(unsigned long *stack, struct stack_info *info) +bool in_entry_stack(unsigned long *stack, struct stack_info *info) { - struct SYSENTER_stack *ss = cpu_SYSENTER_stack(smp_processor_id()); + struct entry_stack *ss = cpu_entry_stack(smp_processor_id()); void *begin = ss; void *end = ss + 1; @@ -53,7 +53,7 @@ bool in_sysenter_stack(unsigned long *stack, struct stack_info *info) if ((void *)stack < begin || (void *)stack >= end) return false; - info->type = STACK_TYPE_SYSENTER; + info->type = STACK_TYPE_ENTRY; info->begin = begin; info->end = end; info->next_sp = NULL; @@ -111,13 +111,13 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, * - task stack * - interrupt stack * - HW exception stacks (double fault, nmi, debug, mce) - * - SYSENTER stack + * - entry stack * * x86-32 can have up to four stacks: * - task stack * - softirq stack * - hardirq stack - * - SYSENTER stack + * - entry stack */ for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { const char *stack_name; diff --git a/arch/x86/kernel/dumpstack_32.c b/arch/x86/kernel/dumpstack_32.c index 5ff13a6b3680..04170f63e3a1 100644 --- a/arch/x86/kernel/dumpstack_32.c +++ b/arch/x86/kernel/dumpstack_32.c @@ -26,8 +26,8 @@ const char *stack_type_name(enum stack_type type) if (type == STACK_TYPE_SOFTIRQ) return "SOFTIRQ"; - if (type == STACK_TYPE_SYSENTER) - return "SYSENTER"; + if (type == STACK_TYPE_ENTRY) + return "ENTRY_TRAMPOLINE"; return NULL; } @@ -96,7 +96,7 @@ int get_stack_info(unsigned long *stack, struct task_struct *task, if (task != current) goto unknown; - if (in_sysenter_stack(stack, info)) + if (in_entry_stack(stack, info)) goto recursion_check; if (in_hardirq_stack(stack, info)) diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c index abc828f8c297..563e28d14f2c 100644 --- a/arch/x86/kernel/dumpstack_64.c +++ b/arch/x86/kernel/dumpstack_64.c @@ -37,8 +37,14 @@ const char *stack_type_name(enum stack_type type) if (type == STACK_TYPE_IRQ) return "IRQ"; - if (type == STACK_TYPE_SYSENTER) - return "SYSENTER"; + if (type == STACK_TYPE_ENTRY) { + /* + * On 64-bit, we have a generic entry stack that we + * use for all the kernel entry points, including + * SYSENTER. + */ + return "ENTRY_TRAMPOLINE"; + } if (type >= STACK_TYPE_EXCEPTION && type <= STACK_TYPE_EXCEPTION_LAST) return exception_stack_names[type - STACK_TYPE_EXCEPTION]; @@ -118,7 +124,7 @@ int get_stack_info(unsigned long *stack, struct task_struct *task, if (in_irq_stack(stack, info)) goto recursion_check; - if (in_sysenter_stack(stack, info)) + if (in_entry_stack(stack, info)) goto recursion_check; goto unknown; From 032fd2e383cb112789bcbb9a728dd4d96a1c280d Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Dec 2017 13:34:50 +0100 Subject: [PATCH 0750/1013] x86/uv: Use the right TLB-flush API commit 3e46e0f5ee3643a1239be9046c7ba6c66ca2b329 upstream. Since uv_flush_tlb_others() implements flush_tlb_others() which is about flushing user mappings, we should use __flush_tlb_single(), which too is about flushing user mappings. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Acked-by: Andrew Banman Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Mike Travis Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/platform/uv/tlb_uv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/platform/uv/tlb_uv.c b/arch/x86/platform/uv/tlb_uv.c index f44c0bc95aa2..8538a6723171 100644 --- a/arch/x86/platform/uv/tlb_uv.c +++ b/arch/x86/platform/uv/tlb_uv.c @@ -299,7 +299,7 @@ static void bau_process_message(struct msg_desc *mdp, struct bau_control *bcp, local_flush_tlb(); stat->d_alltlb++; } else { - __flush_tlb_one(msg->address); + __flush_tlb_single(msg->address); stat->d_onetlb++; } stat->d_requestee++; From de4c8bbd6e1b631fa5f38e75453e331244ad7028 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Dec 2017 13:34:51 +0100 Subject: [PATCH 0751/1013] x86/microcode: Dont abuse the TLB-flush interface commit 23cb7d46f371844c004784ad9552a57446f73e5a upstream. Commit: ec400ddeff20 ("x86/microcode_intel_early.c: Early update ucode on Intel's CPU") ... grubbed into tlbflush internals without coherent explanation. Since it says its a precaution and the SDM doesn't mention anything like this, take it out back. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: fenghua.yu@intel.com Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/tlbflush.h | 19 ++++++------------- arch/x86/kernel/cpu/microcode/intel.c | 13 ------------- 2 files changed, 6 insertions(+), 26 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 509046cfa5ce..c2e45da4e540 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -246,20 +246,9 @@ static inline void __native_flush_tlb(void) preempt_enable(); } -static inline void __native_flush_tlb_global_irq_disabled(void) -{ - unsigned long cr4; - - cr4 = this_cpu_read(cpu_tlbstate.cr4); - /* clear PGE */ - native_write_cr4(cr4 & ~X86_CR4_PGE); - /* write old PGE again and flush TLBs */ - native_write_cr4(cr4); -} - static inline void __native_flush_tlb_global(void) { - unsigned long flags; + unsigned long cr4, flags; if (static_cpu_has(X86_FEATURE_INVPCID)) { /* @@ -277,7 +266,11 @@ static inline void __native_flush_tlb_global(void) */ raw_local_irq_save(flags); - __native_flush_tlb_global_irq_disabled(); + cr4 = this_cpu_read(cpu_tlbstate.cr4); + /* toggle PGE */ + native_write_cr4(cr4 ^ X86_CR4_PGE); + /* write old PGE again and flush TLBs */ + native_write_cr4(cr4); raw_local_irq_restore(flags); } diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c index 7dbcb7adf797..8ccdca6d3f9e 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -565,15 +565,6 @@ static void print_ucode(struct ucode_cpu_info *uci) } #else -/* - * Flush global tlb. We only do this in x86_64 where paging has been enabled - * already and PGE should be enabled as well. - */ -static inline void flush_tlb_early(void) -{ - __native_flush_tlb_global_irq_disabled(); -} - static inline void print_ucode(struct ucode_cpu_info *uci) { struct microcode_intel *mc; @@ -602,10 +593,6 @@ static int apply_microcode_early(struct ucode_cpu_info *uci, bool early) if (rev != mc->hdr.rev) return -1; -#ifdef CONFIG_X86_64 - /* Flush global tlb. This is precaution. */ - flush_tlb_early(); -#endif uci->cpu_sig.rev = rev; if (early) From a0edc4947db9a3cbc6219143ff72e163ad7dbf80 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Dec 2017 13:34:49 +0100 Subject: [PATCH 0752/1013] x86/mm: Use __flush_tlb_one() for kernel memory commit a501686b2923ce6f2ff2b1d0d50682c6411baf72 upstream. __flush_tlb_single() is for user mappings, __flush_tlb_one() for kernel mappings. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/tlb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 3118392cdf75..0569987f6da6 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -551,7 +551,7 @@ static void do_kernel_range_flush(void *info) /* flush range by one by one 'invlpg' */ for (addr = f->start; addr < f->end; addr += PAGE_SIZE) - __flush_tlb_single(addr); + __flush_tlb_one(addr); } void flush_tlb_kernel_range(unsigned long start, unsigned long end) From 6472c50292d4e4b342f4ea4e3e46e7786ca5c99c Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Dec 2017 13:34:46 +0100 Subject: [PATCH 0753/1013] x86/mm: Remove superfluous barriers commit b5fc6d943808b570bdfbec80f40c6b3855f1c48b upstream. atomic64_inc_return() already implies smp_mb() before and after. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/tlbflush.h | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index c2e45da4e540..3e2227386abe 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -60,19 +60,13 @@ static inline void invpcid_flush_all_nonglobals(void) static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) { - u64 new_tlb_gen; - /* * Bump the generation count. This also serves as a full barrier * that synchronizes with switch_mm(): callers are required to order * their read of mm_cpumask after their writes to the paging * structures. */ - smp_mb__before_atomic(); - new_tlb_gen = atomic64_inc_return(&mm->context.tlb_gen); - smp_mb__after_atomic(); - - return new_tlb_gen; + return atomic64_inc_return(&mm->context.tlb_gen); } #ifdef CONFIG_PARAVIRT From 29606f10f399dd159ab6718d05060294b29a4481 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Dec 2017 13:34:52 +0100 Subject: [PATCH 0754/1013] x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what commit 3f67af51e56f291d7417d77c4f67cd774633c5e1 upstream. Per popular request.. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/tlbflush.h | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 3e2227386abe..552d581c8f9f 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -228,6 +228,9 @@ static inline void cr4_set_bits_and_update_boot(unsigned long mask) extern void initialize_tlbstate_and_flush(void); +/* + * flush the entire current user mapping + */ static inline void __native_flush_tlb(void) { /* @@ -240,6 +243,9 @@ static inline void __native_flush_tlb(void) preempt_enable(); } +/* + * flush everything + */ static inline void __native_flush_tlb_global(void) { unsigned long cr4, flags; @@ -269,17 +275,27 @@ static inline void __native_flush_tlb_global(void) raw_local_irq_restore(flags); } +/* + * flush one page in the user mapping + */ static inline void __native_flush_tlb_single(unsigned long addr) { asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); } +/* + * flush everything + */ static inline void __flush_tlb_all(void) { - if (boot_cpu_has(X86_FEATURE_PGE)) + if (boot_cpu_has(X86_FEATURE_PGE)) { __flush_tlb_global(); - else + } else { + /* + * !PGE -> !PCID (setup_pcid()), thus every flush is total. + */ __flush_tlb(); + } /* * Note: if we somehow had PCID but not PGE, then this wouldn't work -- @@ -290,6 +306,9 @@ static inline void __flush_tlb_all(void) */ } +/* + * flush one page in the kernel mapping + */ static inline void __flush_tlb_one(unsigned long addr) { count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE); From b72e0abe99abaa4e64db83776c11ff6847e829cf Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:54 +0100 Subject: [PATCH 0755/1013] x86/mm: Move the CR3 construction functions to tlbflush.h commit 50fb83a62cf472dc53ba23bd3f7bd6c1b2b3b53e upstream. For flushing the TLB, the ASID which has been programmed into the hardware must be known. That differs from what is in 'cpu_tlbstate'. Add functions to transform the 'cpu_tlbstate' values into to the one programmed into the hardware (CR3). It's not easy to include mmu_context.h into tlbflush.h, so just move the CR3 building over to tlbflush.h. Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/mmu_context.h | 29 +---------------------------- arch/x86/include/asm/tlbflush.h | 26 ++++++++++++++++++++++++++ arch/x86/mm/tlb.c | 8 ++++---- 3 files changed, 31 insertions(+), 32 deletions(-) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 5e25423bf9bb..5ede7cae1d67 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -290,33 +290,6 @@ static inline bool arch_vma_access_permitted(struct vm_area_struct *vma, return __pkru_allows_pkey(vma_pkey(vma), write); } -/* - * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID - * bits. This serves two purposes. It prevents a nasty situation in - * which PCID-unaware code saves CR3, loads some other value (with PCID - * == 0), and then restores CR3, thus corrupting the TLB for ASID 0 if - * the saved ASID was nonzero. It also means that any bugs involving - * loading a PCID-enabled CR3 with CR4.PCIDE off will trigger - * deterministically. - */ - -static inline unsigned long build_cr3(struct mm_struct *mm, u16 asid) -{ - if (static_cpu_has(X86_FEATURE_PCID)) { - VM_WARN_ON_ONCE(asid > 4094); - return __sme_pa(mm->pgd) | (asid + 1); - } else { - VM_WARN_ON_ONCE(asid != 0); - return __sme_pa(mm->pgd); - } -} - -static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid) -{ - VM_WARN_ON_ONCE(asid > 4094); - return __sme_pa(mm->pgd) | (asid + 1) | CR3_NOFLUSH; -} - /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). @@ -326,7 +299,7 @@ static inline unsigned long build_cr3_noflush(struct mm_struct *mm, u16 asid) */ static inline unsigned long __get_current_cr3_fast(void) { - unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm), + unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, this_cpu_read(cpu_tlbstate.loaded_mm_asid)); /* For now, be very restrictive about when this can be called. */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 552d581c8f9f..ee7925adfb57 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -69,6 +69,32 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) return atomic64_inc_return(&mm->context.tlb_gen); } +/* + * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits. + * This serves two purposes. It prevents a nasty situation in which + * PCID-unaware code saves CR3, loads some other value (with PCID == 0), + * and then restores CR3, thus corrupting the TLB for ASID 0 if the saved + * ASID was nonzero. It also means that any bugs involving loading a + * PCID-enabled CR3 with CR4.PCIDE off will trigger deterministically. + */ +struct pgd_t; +static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) +{ + if (static_cpu_has(X86_FEATURE_PCID)) { + VM_WARN_ON_ONCE(asid > 4094); + return __sme_pa(pgd) | (asid + 1); + } else { + VM_WARN_ON_ONCE(asid != 0); + return __sme_pa(pgd); + } +} + +static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid) +{ + VM_WARN_ON_ONCE(asid > 4094); + return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH; +} + #ifdef CONFIG_PARAVIRT #include #else diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 0569987f6da6..0a1be3adc97e 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -128,7 +128,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, * isn't free. */ #ifdef CONFIG_DEBUG_VM - if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev, prev_asid))) { + if (WARN_ON_ONCE(__read_cr3() != build_cr3(real_prev->pgd, prev_asid))) { /* * If we were to BUG here, we'd be very likely to kill * the system so hard that we don't see the call trace. @@ -195,7 +195,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, if (need_flush) { this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id); this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen); - write_cr3(build_cr3(next, new_asid)); + write_cr3(build_cr3(next->pgd, new_asid)); /* * NB: This gets called via leave_mm() in the idle path @@ -208,7 +208,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); } else { /* The new ASID is already up to date. */ - write_cr3(build_cr3_noflush(next, new_asid)); + write_cr3(build_cr3_noflush(next->pgd, new_asid)); /* See above wrt _rcuidle. */ trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0); @@ -288,7 +288,7 @@ void initialize_tlbstate_and_flush(void) !(cr4_read_shadow() & X86_CR4_PCIDE)); /* Force ASID 0 and force a TLB flush. */ - write_cr3(build_cr3(mm, 0)); + write_cr3(build_cr3(mm->pgd, 0)); /* Reinitialize tlbstate. */ this_cpu_write(cpu_tlbstate.loaded_mm_asid, 0); From 1765d0a565ee2585cf7b400e81be9a4da96fd7da Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:55 +0100 Subject: [PATCH 0756/1013] x86/mm: Remove hard-coded ASID limit checks commit cb0a9144a744e55207e24dcef812f05cd15a499a upstream. First, it's nice to remove the magic numbers. Second, PAGE_TABLE_ISOLATION is going to consume half of the available ASID space. The space is currently unused, but add a comment to spell out this new restriction. Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/tlbflush.h | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index ee7925adfb57..f88ccd3ae466 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -69,6 +69,22 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) return atomic64_inc_return(&mm->context.tlb_gen); } +/* There are 12 bits of space for ASIDS in CR3 */ +#define CR3_HW_ASID_BITS 12 +/* + * When enabled, PAGE_TABLE_ISOLATION consumes a single bit for + * user/kernel switches + */ +#define PTI_CONSUMED_ASID_BITS 0 + +#define CR3_AVAIL_ASID_BITS (CR3_HW_ASID_BITS - PTI_CONSUMED_ASID_BITS) +/* + * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account + * for them being zero-based. Another -1 is because ASID 0 is reserved for + * use by non-PCID-aware users. + */ +#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2) + /* * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits. * This serves two purposes. It prevents a nasty situation in which @@ -81,7 +97,7 @@ struct pgd_t; static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) { if (static_cpu_has(X86_FEATURE_PCID)) { - VM_WARN_ON_ONCE(asid > 4094); + VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); return __sme_pa(pgd) | (asid + 1); } else { VM_WARN_ON_ONCE(asid != 0); @@ -91,7 +107,7 @@ static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid) { - VM_WARN_ON_ONCE(asid > 4094); + VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH; } From acefb4516bcecc7d084205e00fe2de589d8259bc Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:56 +0100 Subject: [PATCH 0757/1013] x86/mm: Put MMU to hardware ASID translation in one place commit dd95f1a4b5ca904c78e6a097091eb21436478abb upstream. There are effectively two ASID types: 1. The one stored in the mmu_context that goes from 0..5 2. The one programmed into the hardware that goes from 1..6 This consolidates the locations where converting between the two (by doing a +1) to a single place which gives us a nice place to comment. PAGE_TABLE_ISOLATION will also need to, given an ASID, know which hardware ASID to flush for the userspace mapping. Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/tlbflush.h | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index f88ccd3ae466..8b27daff7a7f 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -85,20 +85,26 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) */ #define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2) -/* - * If PCID is on, ASID-aware code paths put the ASID+1 into the PCID bits. - * This serves two purposes. It prevents a nasty situation in which - * PCID-unaware code saves CR3, loads some other value (with PCID == 0), - * and then restores CR3, thus corrupting the TLB for ASID 0 if the saved - * ASID was nonzero. It also means that any bugs involving loading a - * PCID-enabled CR3 with CR4.PCIDE off will trigger deterministically. - */ +static inline u16 kern_pcid(u16 asid) +{ + VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); + /* + * If PCID is on, ASID-aware code paths put the ASID+1 into the + * PCID bits. This serves two purposes. It prevents a nasty + * situation in which PCID-unaware code saves CR3, loads some other + * value (with PCID == 0), and then restores CR3, thus corrupting + * the TLB for ASID 0 if the saved ASID was nonzero. It also means + * that any bugs involving loading a PCID-enabled CR3 with + * CR4.PCIDE off will trigger deterministically. + */ + return asid + 1; +} + struct pgd_t; static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) { if (static_cpu_has(X86_FEATURE_PCID)) { - VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); - return __sme_pa(pgd) | (asid + 1); + return __sme_pa(pgd) | kern_pcid(asid); } else { VM_WARN_ON_ONCE(asid != 0); return __sme_pa(pgd); @@ -108,7 +114,8 @@ static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); - return __sme_pa(pgd) | (asid + 1) | CR3_NOFLUSH; + VM_WARN_ON_ONCE(!this_cpu_has(X86_FEATURE_PCID)); + return __sme_pa(pgd) | kern_pcid(asid) | CR3_NOFLUSH; } #ifdef CONFIG_PARAVIRT From b6167aeb9faf5966daca3f9f973e1b428a43510c Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Dec 2017 13:34:47 +0100 Subject: [PATCH 0758/1013] x86/mm: Create asm/invpcid.h commit 1a3b0caeb77edeac5ce5fa05e6a61c474c9a9745 upstream. Unclutter tlbflush.h a little. Signed-off-by: Peter Zijlstra (Intel) Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/invpcid.h | 53 +++++++++++++++++++++++++++++++++ arch/x86/include/asm/tlbflush.h | 49 +----------------------------- 2 files changed, 54 insertions(+), 48 deletions(-) create mode 100644 arch/x86/include/asm/invpcid.h diff --git a/arch/x86/include/asm/invpcid.h b/arch/x86/include/asm/invpcid.h new file mode 100644 index 000000000000..989cfa86de85 --- /dev/null +++ b/arch/x86/include/asm/invpcid.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_INVPCID +#define _ASM_X86_INVPCID + +static inline void __invpcid(unsigned long pcid, unsigned long addr, + unsigned long type) +{ + struct { u64 d[2]; } desc = { { pcid, addr } }; + + /* + * The memory clobber is because the whole point is to invalidate + * stale TLB entries and, especially if we're flushing global + * mappings, we don't want the compiler to reorder any subsequent + * memory accesses before the TLB flush. + * + * The hex opcode is invpcid (%ecx), %eax in 32-bit mode and + * invpcid (%rcx), %rax in long mode. + */ + asm volatile (".byte 0x66, 0x0f, 0x38, 0x82, 0x01" + : : "m" (desc), "a" (type), "c" (&desc) : "memory"); +} + +#define INVPCID_TYPE_INDIV_ADDR 0 +#define INVPCID_TYPE_SINGLE_CTXT 1 +#define INVPCID_TYPE_ALL_INCL_GLOBAL 2 +#define INVPCID_TYPE_ALL_NON_GLOBAL 3 + +/* Flush all mappings for a given pcid and addr, not including globals. */ +static inline void invpcid_flush_one(unsigned long pcid, + unsigned long addr) +{ + __invpcid(pcid, addr, INVPCID_TYPE_INDIV_ADDR); +} + +/* Flush all mappings for a given PCID, not including globals. */ +static inline void invpcid_flush_single_context(unsigned long pcid) +{ + __invpcid(pcid, 0, INVPCID_TYPE_SINGLE_CTXT); +} + +/* Flush all mappings, including globals, for all PCIDs. */ +static inline void invpcid_flush_all(void) +{ + __invpcid(0, 0, INVPCID_TYPE_ALL_INCL_GLOBAL); +} + +/* Flush all mappings for all PCIDs except globals. */ +static inline void invpcid_flush_all_nonglobals(void) +{ + __invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL); +} + +#endif /* _ASM_X86_INVPCID */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 8b27daff7a7f..171b429f43a2 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -9,54 +9,7 @@ #include #include #include - -static inline void __invpcid(unsigned long pcid, unsigned long addr, - unsigned long type) -{ - struct { u64 d[2]; } desc = { { pcid, addr } }; - - /* - * The memory clobber is because the whole point is to invalidate - * stale TLB entries and, especially if we're flushing global - * mappings, we don't want the compiler to reorder any subsequent - * memory accesses before the TLB flush. - * - * The hex opcode is invpcid (%ecx), %eax in 32-bit mode and - * invpcid (%rcx), %rax in long mode. - */ - asm volatile (".byte 0x66, 0x0f, 0x38, 0x82, 0x01" - : : "m" (desc), "a" (type), "c" (&desc) : "memory"); -} - -#define INVPCID_TYPE_INDIV_ADDR 0 -#define INVPCID_TYPE_SINGLE_CTXT 1 -#define INVPCID_TYPE_ALL_INCL_GLOBAL 2 -#define INVPCID_TYPE_ALL_NON_GLOBAL 3 - -/* Flush all mappings for a given pcid and addr, not including globals. */ -static inline void invpcid_flush_one(unsigned long pcid, - unsigned long addr) -{ - __invpcid(pcid, addr, INVPCID_TYPE_INDIV_ADDR); -} - -/* Flush all mappings for a given PCID, not including globals. */ -static inline void invpcid_flush_single_context(unsigned long pcid) -{ - __invpcid(pcid, 0, INVPCID_TYPE_SINGLE_CTXT); -} - -/* Flush all mappings, including globals, for all PCIDs. */ -static inline void invpcid_flush_all(void) -{ - __invpcid(0, 0, INVPCID_TYPE_ALL_INCL_GLOBAL); -} - -/* Flush all mappings for all PCIDs except globals. */ -static inline void invpcid_flush_all_nonglobals(void) -{ - __invpcid(0, 0, INVPCID_TYPE_ALL_NON_GLOBAL); -} +#include static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) { From 1b0eddf0a1d1beabc2f3b23bf807758667c30fc7 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 20 Dec 2017 18:28:54 +0100 Subject: [PATCH 0759/1013] x86/cpu_entry_area: Move it to a separate unit commit ed1bbc40a0d10e0c5c74fe7bdc6298295cf40255 upstream. Separate the cpu_entry_area code out of cpu/common.c and the fixmap. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpu_entry_area.h | 52 +++++++++++++ arch/x86/include/asm/fixmap.h | 41 +--------- arch/x86/kernel/cpu/common.c | 94 ----------------------- arch/x86/kernel/traps.c | 1 + arch/x86/mm/Makefile | 2 +- arch/x86/mm/cpu_entry_area.c | 104 ++++++++++++++++++++++++++ 6 files changed, 159 insertions(+), 135 deletions(-) create mode 100644 arch/x86/include/asm/cpu_entry_area.h create mode 100644 arch/x86/mm/cpu_entry_area.c diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h new file mode 100644 index 000000000000..5471826803af --- /dev/null +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -0,0 +1,52 @@ +// SPDX-License-Identifier: GPL-2.0 + +#ifndef _ASM_X86_CPU_ENTRY_AREA_H +#define _ASM_X86_CPU_ENTRY_AREA_H + +#include +#include + +/* + * cpu_entry_area is a percpu region that contains things needed by the CPU + * and early entry/exit code. Real types aren't used for all fields here + * to avoid circular header dependencies. + * + * Every field is a virtual alias of some other allocated backing store. + * There is no direct allocation of a struct cpu_entry_area. + */ +struct cpu_entry_area { + char gdt[PAGE_SIZE]; + + /* + * The GDT is just below entry_stack and thus serves (on x86_64) as + * a a read-only guard page. + */ + struct entry_stack_page entry_stack_page; + + /* + * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because + * we need task switches to work, and task switches write to the TSS. + */ + struct tss_struct tss; + + char entry_trampoline[PAGE_SIZE]; + +#ifdef CONFIG_X86_64 + /* + * Exception stacks used for IST entries. + * + * In the future, this should have a separate slot for each stack + * with guard pages between them. + */ + char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]; +#endif +}; + +#define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area)) +#define CPU_ENTRY_AREA_PAGES (CPU_ENTRY_AREA_SIZE / PAGE_SIZE) + +DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); + +extern void setup_cpu_entry_areas(void); + +#endif diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 8153b8d86a3c..fb801662a230 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -25,6 +25,7 @@ #else #include #endif +#include /* * We can't declare FIXADDR_TOP as variable for x86_64 because vsyscall @@ -44,46 +45,6 @@ extern unsigned long __FIXADDR_TOP; PAGE_SIZE) #endif -/* - * cpu_entry_area is a percpu region in the fixmap that contains things - * needed by the CPU and early entry/exit code. Real types aren't used - * for all fields here to avoid circular header dependencies. - * - * Every field is a virtual alias of some other allocated backing store. - * There is no direct allocation of a struct cpu_entry_area. - */ -struct cpu_entry_area { - char gdt[PAGE_SIZE]; - - /* - * The GDT is just below entry_stack and thus serves (on x86_64) as - * a a read-only guard page. - */ - struct entry_stack_page entry_stack_page; - - /* - * On x86_64, the TSS is mapped RO. On x86_32, it's mapped RW because - * we need task switches to work, and task switches write to the TSS. - */ - struct tss_struct tss; - - char entry_trampoline[PAGE_SIZE]; - -#ifdef CONFIG_X86_64 - /* - * Exception stacks used for IST entries. - * - * In the future, this should have a separate slot for each stack - * with guard pages between them. - */ - char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]; -#endif -}; - -#define CPU_ENTRY_AREA_PAGES (sizeof(struct cpu_entry_area) / PAGE_SIZE) - -extern void setup_cpu_entry_areas(void); - /* * Here we define all the compile-time 'special' virtual * addresses. The point is to have a constant address at diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index ed4acbce37a8..8ddcfa4d4165 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -482,102 +482,8 @@ static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = { [0 ... N_EXCEPTION_STACKS - 1] = EXCEPTION_STKSZ, [DEBUG_STACK - 1] = DEBUG_STKSZ }; - -static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks - [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); -#endif - -static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, - entry_stack_storage); - -static void __init -set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot) -{ - for ( ; pages; pages--, idx--, ptr += PAGE_SIZE) - __set_fixmap(idx, per_cpu_ptr_to_phys(ptr), prot); -} - -/* Setup the fixmap mappings only once per-processor */ -static void __init setup_cpu_entry_area(int cpu) -{ -#ifdef CONFIG_X86_64 - extern char _entry_trampoline[]; - - /* On 64-bit systems, we use a read-only fixmap GDT and TSS. */ - pgprot_t gdt_prot = PAGE_KERNEL_RO; - pgprot_t tss_prot = PAGE_KERNEL_RO; -#else - /* - * On native 32-bit systems, the GDT cannot be read-only because - * our double fault handler uses a task gate, and entering through - * a task gate needs to change an available TSS to busy. If the - * GDT is read-only, that will triple fault. The TSS cannot be - * read-only because the CPU writes to it on task switches. - * - * On Xen PV, the GDT must be read-only because the hypervisor - * requires it. - */ - pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ? - PAGE_KERNEL_RO : PAGE_KERNEL; - pgprot_t tss_prot = PAGE_KERNEL; -#endif - - __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot); - set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, entry_stack_page), - per_cpu_ptr(&entry_stack_storage, cpu), 1, - PAGE_KERNEL); - - /* - * The Intel SDM says (Volume 3, 7.2.1): - * - * Avoid placing a page boundary in the part of the TSS that the - * processor reads during a task switch (the first 104 bytes). The - * processor may not correctly perform address translations if a - * boundary occurs in this area. During a task switch, the processor - * reads and writes into the first 104 bytes of each TSS (using - * contiguous physical addresses beginning with the physical address - * of the first byte of the TSS). So, after TSS access begins, if - * part of the 104 bytes is not physically contiguous, the processor - * will access incorrect information without generating a page-fault - * exception. - * - * There are also a lot of errata involving the TSS spanning a page - * boundary. Assert that we're not doing that. - */ - BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^ - offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK); - BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0); - set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss), - &per_cpu(cpu_tss_rw, cpu), - sizeof(struct tss_struct) / PAGE_SIZE, - tss_prot); - -#ifdef CONFIG_X86_32 - per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu); #endif -#ifdef CONFIG_X86_64 - BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0); - BUILD_BUG_ON(sizeof(exception_stacks) != - sizeof(((struct cpu_entry_area *)0)->exception_stacks)); - set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, exception_stacks), - &per_cpu(exception_stacks, cpu), - sizeof(exception_stacks) / PAGE_SIZE, - PAGE_KERNEL); - - __set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline), - __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX); -#endif -} - -void __init setup_cpu_entry_areas(void) -{ - unsigned int cpu; - - for_each_possible_cpu(cpu) - setup_cpu_entry_area(cpu); -} - /* Load the original GDT from the per-cpu structure */ void load_direct_gdt(int cpu) { diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 74136fd16f49..464daed6894f 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include #include diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 7ba7f3d7f477..2e0017af8f9b 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -10,7 +10,7 @@ CFLAGS_REMOVE_mem_encrypt.o = -pg endif obj-y := init.o init_$(BITS).o fault.o ioremap.o extable.o pageattr.o mmap.o \ - pat.o pgtable.o physaddr.o setup_nx.o tlb.o + pat.o pgtable.o physaddr.o setup_nx.o tlb.o cpu_entry_area.o # Make sure __phys_addr has no stackprotector nostackp := $(call cc-option, -fno-stack-protector) diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c new file mode 100644 index 000000000000..235ff9cfaaf4 --- /dev/null +++ b/arch/x86/mm/cpu_entry_area.c @@ -0,0 +1,104 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include + +#include +#include +#include +#include + +static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage); + +#ifdef CONFIG_X86_64 +static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks + [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); +#endif + +static void __init +set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot) +{ + for ( ; pages; pages--, idx--, ptr += PAGE_SIZE) + __set_fixmap(idx, per_cpu_ptr_to_phys(ptr), prot); +} + +/* Setup the fixmap mappings only once per-processor */ +static void __init setup_cpu_entry_area(int cpu) +{ +#ifdef CONFIG_X86_64 + extern char _entry_trampoline[]; + + /* On 64-bit systems, we use a read-only fixmap GDT and TSS. */ + pgprot_t gdt_prot = PAGE_KERNEL_RO; + pgprot_t tss_prot = PAGE_KERNEL_RO; +#else + /* + * On native 32-bit systems, the GDT cannot be read-only because + * our double fault handler uses a task gate, and entering through + * a task gate needs to change an available TSS to busy. If the + * GDT is read-only, that will triple fault. The TSS cannot be + * read-only because the CPU writes to it on task switches. + * + * On Xen PV, the GDT must be read-only because the hypervisor + * requires it. + */ + pgprot_t gdt_prot = boot_cpu_has(X86_FEATURE_XENPV) ? + PAGE_KERNEL_RO : PAGE_KERNEL; + pgprot_t tss_prot = PAGE_KERNEL; +#endif + + __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot); + set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, entry_stack_page), + per_cpu_ptr(&entry_stack_storage, cpu), 1, + PAGE_KERNEL); + + /* + * The Intel SDM says (Volume 3, 7.2.1): + * + * Avoid placing a page boundary in the part of the TSS that the + * processor reads during a task switch (the first 104 bytes). The + * processor may not correctly perform address translations if a + * boundary occurs in this area. During a task switch, the processor + * reads and writes into the first 104 bytes of each TSS (using + * contiguous physical addresses beginning with the physical address + * of the first byte of the TSS). So, after TSS access begins, if + * part of the 104 bytes is not physically contiguous, the processor + * will access incorrect information without generating a page-fault + * exception. + * + * There are also a lot of errata involving the TSS spanning a page + * boundary. Assert that we're not doing that. + */ + BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^ + offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK); + BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0); + set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss), + &per_cpu(cpu_tss_rw, cpu), + sizeof(struct tss_struct) / PAGE_SIZE, + tss_prot); + +#ifdef CONFIG_X86_32 + per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu); +#endif + +#ifdef CONFIG_X86_64 + BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0); + BUILD_BUG_ON(sizeof(exception_stacks) != + sizeof(((struct cpu_entry_area *)0)->exception_stacks)); + set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, exception_stacks), + &per_cpu(exception_stacks, cpu), + sizeof(exception_stacks) / PAGE_SIZE, + PAGE_KERNEL); + + __set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline), + __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX); +#endif +} + +void __init setup_cpu_entry_areas(void) +{ + unsigned int cpu; + + for_each_possible_cpu(cpu) + setup_cpu_entry_area(cpu); +} From 3440093266b786d8b8b2506ea788f4462437cd7a Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 20 Dec 2017 18:51:31 +0100 Subject: [PATCH 0760/1013] x86/cpu_entry_area: Move it out of the fixmap commit 92a0f81d89571e3e8759366e050ee05cc545ef99 upstream. Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big and 0-day already hit a case where the fixmap PTEs were cleared by cleanup_highmap(). Aside of that the fixmap API is a pain as it's all backwards. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 2 + arch/x86/include/asm/cpu_entry_area.h | 18 ++++++- arch/x86/include/asm/desc.h | 1 + arch/x86/include/asm/fixmap.h | 32 +----------- arch/x86/include/asm/pgtable_32_types.h | 15 ++++-- arch/x86/include/asm/pgtable_64_types.h | 47 +++++++++++------- arch/x86/kernel/dumpstack.c | 1 + arch/x86/kernel/traps.c | 5 +- arch/x86/mm/cpu_entry_area.c | 66 +++++++++++++++++++------ arch/x86/mm/dump_pagetables.c | 6 ++- arch/x86/mm/init_32.c | 6 +++ arch/x86/mm/kasan_init_64.c | 29 ++++++----- arch/x86/mm/pgtable_32.c | 1 + arch/x86/xen/mmu_pv.c | 2 - 14 files changed, 143 insertions(+), 88 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 63a41671d25b..51101708a03a 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -12,6 +12,7 @@ ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) ... unused hole ... ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB) ... unused hole ... +fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space @@ -35,6 +36,7 @@ ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) ... unused hole ... ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB) ... unused hole ... +fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h index 5471826803af..2fbc69a0916e 100644 --- a/arch/x86/include/asm/cpu_entry_area.h +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -43,10 +43,26 @@ struct cpu_entry_area { }; #define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area)) -#define CPU_ENTRY_AREA_PAGES (CPU_ENTRY_AREA_SIZE / PAGE_SIZE) +#define CPU_ENTRY_AREA_TOT_SIZE (CPU_ENTRY_AREA_SIZE * NR_CPUS) DECLARE_PER_CPU(struct cpu_entry_area *, cpu_entry_area); extern void setup_cpu_entry_areas(void); +extern void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags); + +#define CPU_ENTRY_AREA_RO_IDT CPU_ENTRY_AREA_BASE +#define CPU_ENTRY_AREA_PER_CPU (CPU_ENTRY_AREA_RO_IDT + PAGE_SIZE) + +#define CPU_ENTRY_AREA_RO_IDT_VADDR ((void *)CPU_ENTRY_AREA_RO_IDT) + +#define CPU_ENTRY_AREA_MAP_SIZE \ + (CPU_ENTRY_AREA_PER_CPU + CPU_ENTRY_AREA_TOT_SIZE - CPU_ENTRY_AREA_BASE) + +extern struct cpu_entry_area *get_cpu_entry_area(int cpu); + +static inline struct entry_stack *cpu_entry_stack(int cpu) +{ + return &get_cpu_entry_area(cpu)->entry_stack_page.stack; +} #endif diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index 2ace1f90d138..bc359dd2f7f6 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index fb801662a230..64c4a30e0d39 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -25,7 +25,6 @@ #else #include #endif -#include /* * We can't declare FIXADDR_TOP as variable for x86_64 because vsyscall @@ -84,7 +83,6 @@ enum fixed_addresses { FIX_IO_APIC_BASE_0, FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS - 1, #endif - FIX_RO_IDT, /* Virtual mapping for read-only IDT */ #ifdef CONFIG_X86_32 FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */ FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1, @@ -100,9 +98,6 @@ enum fixed_addresses { #ifdef CONFIG_X86_INTEL_MID FIX_LNW_VRTC, #endif - /* Fixmap entries to remap the GDTs, one per processor. */ - FIX_CPU_ENTRY_AREA_TOP, - FIX_CPU_ENTRY_AREA_BOTTOM = FIX_CPU_ENTRY_AREA_TOP + (CPU_ENTRY_AREA_PAGES * NR_CPUS) - 1, #ifdef CONFIG_ACPI_APEI_GHES /* Used for GHES mapping from assorted contexts */ @@ -143,7 +138,7 @@ enum fixed_addresses { extern void reserve_top_address(unsigned long reserve); #define FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT) -#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) +#define FIXADDR_START (FIXADDR_TOP - FIXADDR_SIZE) extern int fixmaps_set; @@ -191,30 +186,5 @@ void __init *early_memremap_decrypted_wp(resource_size_t phys_addr, void __early_set_fixmap(enum fixed_addresses idx, phys_addr_t phys, pgprot_t flags); -static inline unsigned int __get_cpu_entry_area_page_index(int cpu, int page) -{ - BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0); - - return FIX_CPU_ENTRY_AREA_BOTTOM - cpu*CPU_ENTRY_AREA_PAGES - page; -} - -#define __get_cpu_entry_area_offset_index(cpu, offset) ({ \ - BUILD_BUG_ON(offset % PAGE_SIZE != 0); \ - __get_cpu_entry_area_page_index(cpu, offset / PAGE_SIZE); \ - }) - -#define get_cpu_entry_area_index(cpu, field) \ - __get_cpu_entry_area_offset_index((cpu), offsetof(struct cpu_entry_area, field)) - -static inline struct cpu_entry_area *get_cpu_entry_area(int cpu) -{ - return (struct cpu_entry_area *)__fix_to_virt(__get_cpu_entry_area_page_index(cpu, 0)); -} - -static inline struct entry_stack *cpu_entry_stack(int cpu) -{ - return &get_cpu_entry_area(cpu)->entry_stack_page.stack; -} - #endif /* !__ASSEMBLY__ */ #endif /* _ASM_X86_FIXMAP_H */ diff --git a/arch/x86/include/asm/pgtable_32_types.h b/arch/x86/include/asm/pgtable_32_types.h index f2ca9b28fd68..ce245b0cdfca 100644 --- a/arch/x86/include/asm/pgtable_32_types.h +++ b/arch/x86/include/asm/pgtable_32_types.h @@ -38,13 +38,22 @@ extern bool __vmalloc_start_set; /* set once high_memory is set */ #define LAST_PKMAP 1024 #endif -#define PKMAP_BASE ((FIXADDR_START - PAGE_SIZE * (LAST_PKMAP + 1)) \ - & PMD_MASK) +/* + * Define this here and validate with BUILD_BUG_ON() in pgtable_32.c + * to avoid include recursion hell + */ +#define CPU_ENTRY_AREA_PAGES (NR_CPUS * 40) + +#define CPU_ENTRY_AREA_BASE \ + ((FIXADDR_START - PAGE_SIZE * (CPU_ENTRY_AREA_PAGES + 1)) & PMD_MASK) + +#define PKMAP_BASE \ + ((CPU_ENTRY_AREA_BASE - PAGE_SIZE) & PMD_MASK) #ifdef CONFIG_HIGHMEM # define VMALLOC_END (PKMAP_BASE - 2 * PAGE_SIZE) #else -# define VMALLOC_END (FIXADDR_START - 2 * PAGE_SIZE) +# define VMALLOC_END (CPU_ENTRY_AREA_BASE - 2 * PAGE_SIZE) #endif #define MODULES_VADDR VMALLOC_START diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 6d5f45dcd4a1..3d27831bc58d 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -76,32 +76,41 @@ typedef struct { pteval_t pte; } pte_t; #define PGDIR_MASK (~(PGDIR_SIZE - 1)) /* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */ -#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL) +#define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL) + #ifdef CONFIG_X86_5LEVEL -#define VMALLOC_SIZE_TB _AC(16384, UL) -#define __VMALLOC_BASE _AC(0xff92000000000000, UL) -#define __VMEMMAP_BASE _AC(0xffd4000000000000, UL) +# define VMALLOC_SIZE_TB _AC(16384, UL) +# define __VMALLOC_BASE _AC(0xff92000000000000, UL) +# define __VMEMMAP_BASE _AC(0xffd4000000000000, UL) #else -#define VMALLOC_SIZE_TB _AC(32, UL) -#define __VMALLOC_BASE _AC(0xffffc90000000000, UL) -#define __VMEMMAP_BASE _AC(0xffffea0000000000, UL) +# define VMALLOC_SIZE_TB _AC(32, UL) +# define __VMALLOC_BASE _AC(0xffffc90000000000, UL) +# define __VMEMMAP_BASE _AC(0xffffea0000000000, UL) #endif + #ifdef CONFIG_RANDOMIZE_MEMORY -#define VMALLOC_START vmalloc_base -#define VMEMMAP_START vmemmap_base +# define VMALLOC_START vmalloc_base +# define VMEMMAP_START vmemmap_base #else -#define VMALLOC_START __VMALLOC_BASE -#define VMEMMAP_START __VMEMMAP_BASE +# define VMALLOC_START __VMALLOC_BASE +# define VMEMMAP_START __VMEMMAP_BASE #endif /* CONFIG_RANDOMIZE_MEMORY */ -#define VMALLOC_END (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL)) -#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) + +#define VMALLOC_END (VMALLOC_START + _AC((VMALLOC_SIZE_TB << 40) - 1, UL)) + +#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) /* The module sections ends with the start of the fixmap */ -#define MODULES_END __fix_to_virt(__end_of_fixed_addresses + 1) -#define MODULES_LEN (MODULES_END - MODULES_VADDR) -#define ESPFIX_PGD_ENTRY _AC(-2, UL) -#define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT) -#define EFI_VA_START ( -4 * (_AC(1, UL) << 30)) -#define EFI_VA_END (-68 * (_AC(1, UL) << 30)) +#define MODULES_END __fix_to_virt(__end_of_fixed_addresses + 1) +#define MODULES_LEN (MODULES_END - MODULES_VADDR) + +#define ESPFIX_PGD_ENTRY _AC(-2, UL) +#define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT) + +#define CPU_ENTRY_AREA_PGD _AC(-3, UL) +#define CPU_ENTRY_AREA_BASE (CPU_ENTRY_AREA_PGD << P4D_SHIFT) + +#define EFI_VA_START ( -4 * (_AC(1, UL) << 30)) +#define EFI_VA_END (-68 * (_AC(1, UL) << 30)) #define EARLY_DYNAMIC_PAGE_TABLES 64 diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 1dd3f533d78c..36b17e0febe8 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -18,6 +18,7 @@ #include #include +#include #include #include diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 464daed6894f..7c16fe0b60c2 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -951,8 +951,9 @@ void __init trap_init(void) * "sidt" instruction will not leak the location of the kernel, and * to defend the IDT against arbitrary memory write vulnerabilities. * It will be reloaded in cpu_init() */ - __set_fixmap(FIX_RO_IDT, __pa_symbol(idt_table), PAGE_KERNEL_RO); - idt_descr.address = fix_to_virt(FIX_RO_IDT); + cea_set_pte(CPU_ENTRY_AREA_RO_IDT_VADDR, __pa_symbol(idt_table), + PAGE_KERNEL_RO); + idt_descr.address = CPU_ENTRY_AREA_RO_IDT; /* * Should be a barrier for any external CPU state: diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c index 235ff9cfaaf4..21e8b595cbb1 100644 --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -15,11 +15,27 @@ static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks [(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]); #endif +struct cpu_entry_area *get_cpu_entry_area(int cpu) +{ + unsigned long va = CPU_ENTRY_AREA_PER_CPU + cpu * CPU_ENTRY_AREA_SIZE; + BUILD_BUG_ON(sizeof(struct cpu_entry_area) % PAGE_SIZE != 0); + + return (struct cpu_entry_area *) va; +} +EXPORT_SYMBOL(get_cpu_entry_area); + +void cea_set_pte(void *cea_vaddr, phys_addr_t pa, pgprot_t flags) +{ + unsigned long va = (unsigned long) cea_vaddr; + + set_pte_vaddr(va, pfn_pte(pa >> PAGE_SHIFT, flags)); +} + static void __init -set_percpu_fixmap_pages(int idx, void *ptr, int pages, pgprot_t prot) +cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot) { - for ( ; pages; pages--, idx--, ptr += PAGE_SIZE) - __set_fixmap(idx, per_cpu_ptr_to_phys(ptr), prot); + for ( ; pages; pages--, cea_vaddr+= PAGE_SIZE, ptr += PAGE_SIZE) + cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot); } /* Setup the fixmap mappings only once per-processor */ @@ -47,10 +63,12 @@ static void __init setup_cpu_entry_area(int cpu) pgprot_t tss_prot = PAGE_KERNEL; #endif - __set_fixmap(get_cpu_entry_area_index(cpu, gdt), get_cpu_gdt_paddr(cpu), gdt_prot); - set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, entry_stack_page), - per_cpu_ptr(&entry_stack_storage, cpu), 1, - PAGE_KERNEL); + cea_set_pte(&get_cpu_entry_area(cpu)->gdt, get_cpu_gdt_paddr(cpu), + gdt_prot); + + cea_map_percpu_pages(&get_cpu_entry_area(cpu)->entry_stack_page, + per_cpu_ptr(&entry_stack_storage, cpu), 1, + PAGE_KERNEL); /* * The Intel SDM says (Volume 3, 7.2.1): @@ -72,10 +90,9 @@ static void __init setup_cpu_entry_area(int cpu) BUILD_BUG_ON((offsetof(struct tss_struct, x86_tss) ^ offsetofend(struct tss_struct, x86_tss)) & PAGE_MASK); BUILD_BUG_ON(sizeof(struct tss_struct) % PAGE_SIZE != 0); - set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, tss), - &per_cpu(cpu_tss_rw, cpu), - sizeof(struct tss_struct) / PAGE_SIZE, - tss_prot); + cea_map_percpu_pages(&get_cpu_entry_area(cpu)->tss, + &per_cpu(cpu_tss_rw, cpu), + sizeof(struct tss_struct) / PAGE_SIZE, tss_prot); #ifdef CONFIG_X86_32 per_cpu(cpu_entry_area, cpu) = get_cpu_entry_area(cpu); @@ -85,20 +102,37 @@ static void __init setup_cpu_entry_area(int cpu) BUILD_BUG_ON(sizeof(exception_stacks) % PAGE_SIZE != 0); BUILD_BUG_ON(sizeof(exception_stacks) != sizeof(((struct cpu_entry_area *)0)->exception_stacks)); - set_percpu_fixmap_pages(get_cpu_entry_area_index(cpu, exception_stacks), - &per_cpu(exception_stacks, cpu), - sizeof(exception_stacks) / PAGE_SIZE, - PAGE_KERNEL); + cea_map_percpu_pages(&get_cpu_entry_area(cpu)->exception_stacks, + &per_cpu(exception_stacks, cpu), + sizeof(exception_stacks) / PAGE_SIZE, PAGE_KERNEL); - __set_fixmap(get_cpu_entry_area_index(cpu, entry_trampoline), + cea_set_pte(&get_cpu_entry_area(cpu)->entry_trampoline, __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX); #endif } +static __init void setup_cpu_entry_area_ptes(void) +{ +#ifdef CONFIG_X86_32 + unsigned long start, end; + + BUILD_BUG_ON(CPU_ENTRY_AREA_PAGES * PAGE_SIZE < CPU_ENTRY_AREA_MAP_SIZE); + BUG_ON(CPU_ENTRY_AREA_BASE & ~PMD_MASK); + + start = CPU_ENTRY_AREA_BASE; + end = start + CPU_ENTRY_AREA_MAP_SIZE; + + for (; start < end; start += PMD_SIZE) + populate_extra_pte(start); +#endif +} + void __init setup_cpu_entry_areas(void) { unsigned int cpu; + setup_cpu_entry_area_ptes(); + for_each_possible_cpu(cpu) setup_cpu_entry_area(cpu); } diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index fdf09d8f98da..43dedbfb7257 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -58,6 +58,7 @@ enum address_markers_idx { KASAN_SHADOW_START_NR, KASAN_SHADOW_END_NR, #endif + CPU_ENTRY_AREA_NR, #ifdef CONFIG_X86_ESPFIX64 ESPFIX_START_NR, #endif @@ -81,6 +82,7 @@ static struct addr_marker address_markers[] = { [KASAN_SHADOW_START_NR] = { KASAN_SHADOW_START, "KASAN shadow" }, [KASAN_SHADOW_END_NR] = { KASAN_SHADOW_END, "KASAN shadow end" }, #endif + [CPU_ENTRY_AREA_NR] = { CPU_ENTRY_AREA_BASE,"CPU entry Area" }, #ifdef CONFIG_X86_ESPFIX64 [ESPFIX_START_NR] = { ESPFIX_BASE_ADDR, "ESPfix Area", 16 }, #endif @@ -104,6 +106,7 @@ enum address_markers_idx { #ifdef CONFIG_HIGHMEM PKMAP_BASE_NR, #endif + CPU_ENTRY_AREA_NR, FIXADDR_START_NR, END_OF_SPACE_NR, }; @@ -116,6 +119,7 @@ static struct addr_marker address_markers[] = { #ifdef CONFIG_HIGHMEM [PKMAP_BASE_NR] = { 0UL, "Persistent kmap() Area" }, #endif + [CPU_ENTRY_AREA_NR] = { 0UL, "CPU entry area" }, [FIXADDR_START_NR] = { 0UL, "Fixmap area" }, [END_OF_SPACE_NR] = { -1, NULL } }; @@ -541,8 +545,8 @@ static int __init pt_dump_init(void) address_markers[PKMAP_BASE_NR].start_address = PKMAP_BASE; # endif address_markers[FIXADDR_START_NR].start_address = FIXADDR_START; + address_markers[CPU_ENTRY_AREA_NR].start_address = CPU_ENTRY_AREA_BASE; #endif - return 0; } __initcall(pt_dump_init); diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 8a64a6f2848d..135c9a7898c7 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -50,6 +50,7 @@ #include #include #include +#include #include #include "mm_internal.h" @@ -766,6 +767,7 @@ void __init mem_init(void) mem_init_print_info(NULL); printk(KERN_INFO "virtual kernel memory layout:\n" " fixmap : 0x%08lx - 0x%08lx (%4ld kB)\n" + " cpu_entry : 0x%08lx - 0x%08lx (%4ld kB)\n" #ifdef CONFIG_HIGHMEM " pkmap : 0x%08lx - 0x%08lx (%4ld kB)\n" #endif @@ -777,6 +779,10 @@ void __init mem_init(void) FIXADDR_START, FIXADDR_TOP, (FIXADDR_TOP - FIXADDR_START) >> 10, + CPU_ENTRY_AREA_BASE, + CPU_ENTRY_AREA_BASE + CPU_ENTRY_AREA_MAP_SIZE, + CPU_ENTRY_AREA_MAP_SIZE >> 10, + #ifdef CONFIG_HIGHMEM PKMAP_BASE, PKMAP_BASE+LAST_PKMAP*PAGE_SIZE, (LAST_PKMAP*PAGE_SIZE) >> 10, diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 9ec70d780f1f..47388f0c0e59 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -15,6 +15,7 @@ #include #include #include +#include extern struct range pfn_mapped[E820_MAX_ENTRIES]; @@ -322,31 +323,33 @@ void __init kasan_init(void) map_range(&pfn_mapped[i]); } - kasan_populate_zero_shadow( - kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), - kasan_mem_to_shadow((void *)__START_KERNEL_map)); - - kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext), - (unsigned long)kasan_mem_to_shadow(_end), - early_pfn_to_nid(__pa(_stext))); - - shadow_cpu_entry_begin = (void *)__fix_to_virt(FIX_CPU_ENTRY_AREA_BOTTOM); + shadow_cpu_entry_begin = (void *)CPU_ENTRY_AREA_BASE; shadow_cpu_entry_begin = kasan_mem_to_shadow(shadow_cpu_entry_begin); shadow_cpu_entry_begin = (void *)round_down((unsigned long)shadow_cpu_entry_begin, PAGE_SIZE); - shadow_cpu_entry_end = (void *)(__fix_to_virt(FIX_CPU_ENTRY_AREA_TOP) + PAGE_SIZE); + shadow_cpu_entry_end = (void *)(CPU_ENTRY_AREA_BASE + + CPU_ENTRY_AREA_MAP_SIZE); shadow_cpu_entry_end = kasan_mem_to_shadow(shadow_cpu_entry_end); shadow_cpu_entry_end = (void *)round_up((unsigned long)shadow_cpu_entry_end, PAGE_SIZE); - kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END), - shadow_cpu_entry_begin); + kasan_populate_zero_shadow( + kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), + shadow_cpu_entry_begin); kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin, (unsigned long)shadow_cpu_entry_end, 0); - kasan_populate_zero_shadow(shadow_cpu_entry_end, (void *)KASAN_SHADOW_END); + kasan_populate_zero_shadow(shadow_cpu_entry_end, + kasan_mem_to_shadow((void *)__START_KERNEL_map)); + + kasan_populate_shadow((unsigned long)kasan_mem_to_shadow(_stext), + (unsigned long)kasan_mem_to_shadow(_end), + early_pfn_to_nid(__pa(_stext))); + + kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END), + (void *)KASAN_SHADOW_END); load_cr3(init_top_pgt); __flush_tlb_all(); diff --git a/arch/x86/mm/pgtable_32.c b/arch/x86/mm/pgtable_32.c index 6b9bf023a700..c3c5274410a9 100644 --- a/arch/x86/mm/pgtable_32.c +++ b/arch/x86/mm/pgtable_32.c @@ -10,6 +10,7 @@ #include #include +#include #include #include #include diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index c2454237fa67..a0e2b8c6e5c7 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2261,7 +2261,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) switch (idx) { case FIX_BTMAP_END ... FIX_BTMAP_BEGIN: - case FIX_RO_IDT: #ifdef CONFIG_X86_32 case FIX_WP_TEST: # ifdef CONFIG_HIGHMEM @@ -2272,7 +2271,6 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t phys, pgprot_t prot) #endif case FIX_TEXT_POKE0: case FIX_TEXT_POKE1: - case FIX_CPU_ENTRY_AREA_TOP ... FIX_CPU_ENTRY_AREA_BOTTOM: /* All local page mappings */ pte = pfn_pte(phys, prot); break; From 763f7eaf606281ccfaa2f95445219f797697ed70 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 17 Dec 2017 10:56:29 +0100 Subject: [PATCH 0761/1013] init: Invoke init_espfix_bsp() from mm_init() commit 613e396bc0d4c7604fba23256644e78454c68cf6 upstream. init_espfix_bsp() needs to be invoked before the page table isolation initialization. Move it into mm_init() which is the place where pti_init() will be added. While at it get rid of the #ifdeffery and provide proper stub functions. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/espfix.h | 7 ++++--- arch/x86/kernel/smpboot.c | 6 +----- include/asm-generic/pgtable.h | 5 +++++ init/main.c | 6 ++---- 4 files changed, 12 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/espfix.h b/arch/x86/include/asm/espfix.h index 0211029076ea..6777480d8a42 100644 --- a/arch/x86/include/asm/espfix.h +++ b/arch/x86/include/asm/espfix.h @@ -2,7 +2,7 @@ #ifndef _ASM_X86_ESPFIX_H #define _ASM_X86_ESPFIX_H -#ifdef CONFIG_X86_64 +#ifdef CONFIG_X86_ESPFIX64 #include @@ -11,7 +11,8 @@ DECLARE_PER_CPU_READ_MOSTLY(unsigned long, espfix_waddr); extern void init_espfix_bsp(void); extern void init_espfix_ap(int cpu); - -#endif /* CONFIG_X86_64 */ +#else +static inline void init_espfix_ap(int cpu) { } +#endif #endif /* _ASM_X86_ESPFIX_H */ diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 142126ab5aae..12bf07d44dfe 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -990,12 +990,8 @@ static int do_boot_cpu(int apicid, int cpu, struct task_struct *idle, initial_code = (unsigned long)start_secondary; initial_stack = idle->thread.sp; - /* - * Enable the espfix hack for this CPU - */ -#ifdef CONFIG_X86_ESPFIX64 + /* Enable the espfix hack for this CPU */ init_espfix_ap(cpu); -#endif /* So we see what's up */ announce_cpu(cpu, apicid); diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 1ac457511f4e..045a7f52ab3a 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -1025,6 +1025,11 @@ static inline int pmd_clear_huge(pmd_t *pmd) struct file; int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, unsigned long size, pgprot_t *vma_prot); + +#ifndef CONFIG_X86_ESPFIX64 +static inline void init_espfix_bsp(void) { } +#endif + #endif /* !__ASSEMBLY__ */ #ifndef io_remap_pfn_range diff --git a/init/main.c b/init/main.c index 0ee9c6866ada..8a390f60ec81 100644 --- a/init/main.c +++ b/init/main.c @@ -504,6 +504,8 @@ static void __init mm_init(void) pgtable_init(); vmalloc_init(); ioremap_huge_init(); + /* Should be run before the first non-init thread is created */ + init_espfix_bsp(); } asmlinkage __visible void __init start_kernel(void) @@ -673,10 +675,6 @@ asmlinkage __visible void __init start_kernel(void) #ifdef CONFIG_X86 if (efi_enabled(EFI_RUNTIME_SERVICES)) efi_enter_virtual_mode(); -#endif -#ifdef CONFIG_X86_ESPFIX64 - /* Should be run before the first non-init thread is created */ - init_espfix_bsp(); #endif thread_stack_cache_init(); cred_init(); From 752d01704ad18371fa6d15ef16f5dea7972be821 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sat, 23 Dec 2017 19:45:11 +0100 Subject: [PATCH 0762/1013] x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit commit f6c4fd506cb626e4346aa81688f255e593a7c5a0 upstream. The loop which populates the CPU entry area PMDs can wrap around on 32bit machines when the number of CPUs is small. It worked wonderful for NR_CPUS=64 for whatever reason and the moron who wrote that code did not bother to test it with !SMP. Check for the wraparound to fix it. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Reported-by: kernel test robot Signed-off-by: Thomas "Feels stupid" Gleixner Tested-by: Borislav Petkov Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/cpu_entry_area.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c index 21e8b595cbb1..fe814fd5e014 100644 --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -122,7 +122,8 @@ static __init void setup_cpu_entry_area_ptes(void) start = CPU_ENTRY_AREA_BASE; end = start + CPU_ENTRY_AREA_MAP_SIZE; - for (; start < end; start += PMD_SIZE) + /* Careful here: start + PMD_SIZE might wrap around */ + for (; start < end && start >= CPU_ENTRY_AREA_BASE; start += PMD_SIZE) populate_extra_pte(start); #endif } From 7a5d5789819417d67ccb6c81f9339ee75f97370a Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Thu, 14 Dec 2017 13:31:16 +0100 Subject: [PATCH 0763/1013] ACPI: APEI / ERST: Fix missing error handling in erst_reader() commit bb82e0b4a7e96494f0c1004ce50cec3d7b5fb3d1 upstream. The commit f6f828513290 ("pstore: pass allocated memory region back to caller") changed the check of the return value from erst_read() in erst_reader() in the following way: if (len == -ENOENT) goto skip; - else if (len < 0) { - rc = -1; + else if (len < sizeof(*rcd)) { + rc = -EIO; goto out; This introduced another bug: since the comparison with sizeof() is cast to unsigned, a negative len value doesn't hit any longer. As a result, when an error is returned from erst_read(), the code falls through, and it may eventually lead to some weird thing like memory corruption. This patch adds the negative error value check more explicitly for addressing the issue. Fixes: f6f828513290 (pstore: pass allocated memory region back to caller) Tested-by: Jerry Tang Signed-off-by: Takashi Iwai Acked-by: Kees Cook Reviewed-by: Borislav Petkov Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/apei/erst.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/apei/erst.c b/drivers/acpi/apei/erst.c index 2c462beee551..a943cf17faa7 100644 --- a/drivers/acpi/apei/erst.c +++ b/drivers/acpi/apei/erst.c @@ -1007,7 +1007,7 @@ static ssize_t erst_reader(struct pstore_record *record) /* The record may be cleared by others, try read next record */ if (len == -ENOENT) goto skip; - else if (len < sizeof(*rcd)) { + else if (len < 0 || len < sizeof(*rcd)) { rc = -EIO; goto out; } From 94e0c5ab52e2bdc1ddbaec32ca0e477a2737eb04 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 30 Nov 2017 19:42:52 -0800 Subject: [PATCH 0764/1013] acpi, nfit: fix health event notification commit adf6895754e2503d994a765535fd1813f8834674 upstream. Integration testing with a BIOS that generates injected health event notifications fails to communicate those events to userspace. The nfit driver neglects to link the ACPI DIMM device with the necessary driver data so acpi_nvdimm_notify() fails this lookup: nfit_mem = dev_get_drvdata(dev); if (nfit_mem && nfit_mem->flags_attr) sysfs_notify_dirent(nfit_mem->flags_attr); Add the necessary linkage when installing the notification handler and clean it up when the nfit driver instance is torn down. Cc: Toshi Kani Cc: Vishal Verma Fixes: ba9c8dd3c222 ("acpi, nfit: add dimm device notification support") Reported-by: Daniel Osawa Tested-by: Daniel Osawa Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- drivers/acpi/nfit/core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c index 9c2c49b6a240..dea0fb3d6f64 100644 --- a/drivers/acpi/nfit/core.c +++ b/drivers/acpi/nfit/core.c @@ -1457,6 +1457,11 @@ static int acpi_nfit_add_dimm(struct acpi_nfit_desc *acpi_desc, dev_name(&adev_dimm->dev)); return -ENXIO; } + /* + * Record nfit_mem for the notification path to track back to + * the nfit sysfs attributes for this dimm device object. + */ + dev_set_drvdata(&adev_dimm->dev, nfit_mem); /* * Until standardization materializes we need to consider 4 @@ -1516,9 +1521,11 @@ static void shutdown_dimm_notify(void *data) sysfs_put(nfit_mem->flags_attr); nfit_mem->flags_attr = NULL; } - if (adev_dimm) + if (adev_dimm) { acpi_remove_notify_handler(adev_dimm->handle, ACPI_DEVICE_NOTIFY, acpi_nvdimm_notify); + dev_set_drvdata(&adev_dimm->dev, NULL); + } } mutex_unlock(&acpi_desc->init_mutex); } From 29082870f58a9b0c793bea745de52a07c74f09aa Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 29 Nov 2017 01:18:57 -0800 Subject: [PATCH 0765/1013] crypto: skcipher - set walk.iv for zero-length inputs commit 2b4f27c36bcd46e820ddb9a8e6fe6a63fa4250b8 upstream. All the ChaCha20 algorithms as well as the ARM bit-sliced AES-XTS algorithms call skcipher_walk_virt(), then access the IV (walk.iv) before checking whether any bytes need to be processed (walk.nbytes). But if the input is empty, then skcipher_walk_virt() doesn't set the IV, and the algorithms crash trying to use the uninitialized IV pointer. Fix it by setting the IV earlier in skcipher_walk_virt(). Also fix it for the AEAD walk functions. This isn't a perfect solution because we can't actually align the IV to ->cra_alignmask unless there are bytes to process, for one because the temporary buffer for the aligned IV is freed by skcipher_walk_done(), which is only called when there are bytes to process. Thus, algorithms that require aligned IVs will still need to avoid accessing the IV when walk.nbytes == 0. Still, many algorithms/architectures are fine with IVs having any alignment, and even for those that aren't, a misaligned pointer bug is much less severe than an uninitialized pointer bug. This change also matches the behavior of the older blkcipher_walk API. Fixes: 0cabf2af6f5a ("crypto: skcipher - Fix crash on zero-length input") Reported-by: syzbot Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/skcipher.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/crypto/skcipher.c b/crypto/skcipher.c index 778e0ff42bfa..11af5fd6a443 100644 --- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -449,6 +449,8 @@ static int skcipher_walk_skcipher(struct skcipher_walk *walk, walk->total = req->cryptlen; walk->nbytes = 0; + walk->iv = req->iv; + walk->oiv = req->iv; if (unlikely(!walk->total)) return 0; @@ -456,9 +458,6 @@ static int skcipher_walk_skcipher(struct skcipher_walk *walk, scatterwalk_start(&walk->in, req->src); scatterwalk_start(&walk->out, req->dst); - walk->iv = req->iv; - walk->oiv = req->iv; - walk->flags &= ~SKCIPHER_WALK_SLEEP; walk->flags |= req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ? SKCIPHER_WALK_SLEEP : 0; @@ -510,6 +509,8 @@ static int skcipher_walk_aead_common(struct skcipher_walk *walk, int err; walk->nbytes = 0; + walk->iv = req->iv; + walk->oiv = req->iv; if (unlikely(!walk->total)) return 0; @@ -525,9 +526,6 @@ static int skcipher_walk_aead_common(struct skcipher_walk *walk, scatterwalk_done(&walk->in, 0, walk->total); scatterwalk_done(&walk->out, 0, walk->total); - walk->iv = req->iv; - walk->oiv = req->iv; - if (req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) walk->flags |= SKCIPHER_WALK_SLEEP; else From 88990591f0b0e7d4aedf0255fc26a09a2f899212 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Thu, 30 Nov 2017 13:39:27 +0100 Subject: [PATCH 0766/1013] crypto: mcryptd - protect the per-CPU queue with a lock commit 9abffc6f2efe46c3564c04312e52e07622d40e51 upstream. mcryptd_enqueue_request() grabs the per-CPU queue struct and protects access to it with disabled preemption. Then it schedules a worker on the same CPU. The worker in mcryptd_queue_worker() guards access to the same per-CPU variable with disabled preemption. If we take CPU-hotplug into account then it is possible that between queue_work_on() and the actual invocation of the worker the CPU goes down and the worker will be scheduled on _another_ CPU. And here the preempt_disable() protection does not work anymore. The easiest thing is to add a spin_lock() to guard access to the list. Another detail: mcryptd_queue_worker() is not processing more than MCRYPTD_BATCH invocation in a row. If there are still items left, then it will invoke queue_work() to proceed with more later. *I* would suggest to simply drop that check because it does not use a system workqueue and the workqueue is already marked as "CPU_INTENSIVE". And if preemption is required then the scheduler should do it. However if queue_work() is used then the work item is marked as CPU unbound. That means it will try to run on the local CPU but it may run on another CPU as well. Especially with CONFIG_DEBUG_WQ_FORCE_RR_CPU=y. Again, the preempt_disable() won't work here but lock which was introduced will help. In order to keep work-item on the local CPU (and avoid RR) I changed it to queue_work_on(). Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/mcryptd.c | 23 ++++++++++------------- include/crypto/mcryptd.h | 1 + 2 files changed, 11 insertions(+), 13 deletions(-) diff --git a/crypto/mcryptd.c b/crypto/mcryptd.c index 4e6472658852..eca04d3729b3 100644 --- a/crypto/mcryptd.c +++ b/crypto/mcryptd.c @@ -81,6 +81,7 @@ static int mcryptd_init_queue(struct mcryptd_queue *queue, pr_debug("cpu_queue #%d %p\n", cpu, queue->cpu_queue); crypto_init_queue(&cpu_queue->queue, max_cpu_qlen); INIT_WORK(&cpu_queue->work, mcryptd_queue_worker); + spin_lock_init(&cpu_queue->q_lock); } return 0; } @@ -104,15 +105,16 @@ static int mcryptd_enqueue_request(struct mcryptd_queue *queue, int cpu, err; struct mcryptd_cpu_queue *cpu_queue; - cpu = get_cpu(); - cpu_queue = this_cpu_ptr(queue->cpu_queue); - rctx->tag.cpu = cpu; + cpu_queue = raw_cpu_ptr(queue->cpu_queue); + spin_lock(&cpu_queue->q_lock); + cpu = smp_processor_id(); + rctx->tag.cpu = smp_processor_id(); err = crypto_enqueue_request(&cpu_queue->queue, request); pr_debug("enqueue request: cpu %d cpu_queue %p request %p\n", cpu, cpu_queue, request); + spin_unlock(&cpu_queue->q_lock); queue_work_on(cpu, kcrypto_wq, &cpu_queue->work); - put_cpu(); return err; } @@ -161,16 +163,11 @@ static void mcryptd_queue_worker(struct work_struct *work) cpu_queue = container_of(work, struct mcryptd_cpu_queue, work); i = 0; while (i < MCRYPTD_BATCH || single_task_running()) { - /* - * preempt_disable/enable is used to prevent - * being preempted by mcryptd_enqueue_request() - */ - local_bh_disable(); - preempt_disable(); + + spin_lock_bh(&cpu_queue->q_lock); backlog = crypto_get_backlog(&cpu_queue->queue); req = crypto_dequeue_request(&cpu_queue->queue); - preempt_enable(); - local_bh_enable(); + spin_unlock_bh(&cpu_queue->q_lock); if (!req) { mcryptd_opportunistic_flush(); @@ -185,7 +182,7 @@ static void mcryptd_queue_worker(struct work_struct *work) ++i; } if (cpu_queue->queue.qlen) - queue_work(kcrypto_wq, &cpu_queue->work); + queue_work_on(smp_processor_id(), kcrypto_wq, &cpu_queue->work); } void mcryptd_flusher(struct work_struct *__work) diff --git a/include/crypto/mcryptd.h b/include/crypto/mcryptd.h index cceafa01f907..b67404fc4b34 100644 --- a/include/crypto/mcryptd.h +++ b/include/crypto/mcryptd.h @@ -27,6 +27,7 @@ static inline struct mcryptd_ahash *__mcryptd_ahash_cast( struct mcryptd_cpu_queue { struct crypto_queue queue; + spinlock_t q_lock; struct work_struct work; }; From c692698ebebed838627aa3879e30959c4a75f033 Mon Sep 17 00:00:00 2001 From: Stephan Mueller Date: Wed, 29 Nov 2017 12:02:23 +0100 Subject: [PATCH 0767/1013] crypto: af_alg - wait for data at beginning of recvmsg commit 11edb555966ed2c66c533d17c604f9d7e580a829 upstream. The wait for data is a non-atomic operation that can sleep and therefore potentially release the socket lock. The release of the socket lock allows another thread to modify the context data structure. The waiting operation for new data therefore must be called at the beginning of recvmsg. This prevents a race condition where checks of the members of the context data structure are performed by recvmsg while there is a potential for modification of these values. Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") Reported-by: syzbot Signed-off-by: Stephan Mueller Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/af_alg.c | 6 ------ crypto/algif_aead.c | 6 ++++++ crypto/algif_skcipher.c | 6 ++++++ 3 files changed, 12 insertions(+), 6 deletions(-) diff --git a/crypto/af_alg.c b/crypto/af_alg.c index e181073ef64d..6ec360213107 100644 --- a/crypto/af_alg.c +++ b/crypto/af_alg.c @@ -1165,12 +1165,6 @@ int af_alg_get_rsgl(struct sock *sk, struct msghdr *msg, int flags, if (!af_alg_readable(sk)) break; - if (!ctx->used) { - err = af_alg_wait_for_data(sk, flags); - if (err) - return err; - } - seglen = min_t(size_t, (maxsize - len), msg_data_left(msg)); diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index 3d793bc2aa82..ba07da8422c6 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -111,6 +111,12 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, size_t usedpages = 0; /* [in] RX bufs to be used from user */ size_t processed = 0; /* [in] TX bufs to be consumed */ + if (!ctx->used) { + err = af_alg_wait_for_data(sk, flags); + if (err) + return err; + } + /* * Data length provided by caller via sendmsg/sendpage that has not * yet been processed. diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index 30ee2a8e8f42..3f4458f08293 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -72,6 +72,12 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg, int err = 0; size_t len = 0; + if (!ctx->used) { + err = af_alg_wait_for_data(sk, flags); + if (err) + return err; + } + /* Allocate cipher request for current operation. */ areq = af_alg_alloc_areq(sk, sizeof(struct af_alg_async_req) + crypto_skcipher_reqsize(tfm)); From f09fca41e29c68944e75ab5f65a2efcfcc88aceb Mon Sep 17 00:00:00 2001 From: Stephan Mueller Date: Fri, 8 Dec 2017 11:50:37 +0100 Subject: [PATCH 0768/1013] crypto: af_alg - fix race accessing cipher request commit d53c5135792319e095bb126bc43b2ee98586f7fe upstream. When invoking an asynchronous cipher operation, the invocation of the callback may be performed before the subsequent operations in the initial code path are invoked. The callback deletes the cipher request data structure which implies that after the invocation of the asynchronous cipher operation, this data structure must not be accessed any more. The setting of the return code size with the request data structure must therefore be moved before the invocation of the asynchronous cipher operation. Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management") Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management") Reported-by: syzbot Signed-off-by: Stephan Mueller Acked-by: Jonathan Cameron Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/algif_aead.c | 10 +++++----- crypto/algif_skcipher.c | 10 +++++----- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c index ba07da8422c6..782cb8fec323 100644 --- a/crypto/algif_aead.c +++ b/crypto/algif_aead.c @@ -291,6 +291,10 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, /* AIO operation */ sock_hold(sk); areq->iocb = msg->msg_iocb; + + /* Remember output size that will be generated. */ + areq->outlen = outlen; + aead_request_set_callback(&areq->cra_u.aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG, af_alg_async_cb, areq); @@ -298,12 +302,8 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg, crypto_aead_decrypt(&areq->cra_u.aead_req); /* AIO operation in progress */ - if (err == -EINPROGRESS || err == -EBUSY) { - /* Remember output size that will be generated. */ - areq->outlen = outlen; - + if (err == -EINPROGRESS || err == -EBUSY) return -EIOCBQUEUED; - } sock_put(sk); } else { diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c index 3f4458f08293..7a3e663d54d5 100644 --- a/crypto/algif_skcipher.c +++ b/crypto/algif_skcipher.c @@ -125,6 +125,10 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg, /* AIO operation */ sock_hold(sk); areq->iocb = msg->msg_iocb; + + /* Remember output size that will be generated. */ + areq->outlen = len; + skcipher_request_set_callback(&areq->cra_u.skcipher_req, CRYPTO_TFM_REQ_MAY_SLEEP, af_alg_async_cb, areq); @@ -133,12 +137,8 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg, crypto_skcipher_decrypt(&areq->cra_u.skcipher_req); /* AIO operation in progress */ - if (err == -EINPROGRESS || err == -EBUSY) { - /* Remember output size that will be generated. */ - areq->outlen = len; - + if (err == -EINPROGRESS || err == -EBUSY) return -EIOCBQUEUED; - } sock_put(sk); } else { From de3b66c01edc8ac2bea82c0337e75f6b5498bf37 Mon Sep 17 00:00:00 2001 From: Jon Hunter Date: Tue, 14 Nov 2017 14:43:27 +0000 Subject: [PATCH 0769/1013] mfd: cros ec: spi: Don't send first message too soon commit 15d8374874ded0bec37ef27f8301a6d54032c0e5 upstream. On the Tegra124 Nyan-Big chromebook the very first SPI message sent to the EC is failing. The Tegra SPI driver configures the SPI chip-selects to be active-high by default (and always has for many years). The EC SPI requires an active-low chip-select and so the Tegra chip-select is reconfigured to be active-low when the EC SPI driver calls spi_setup(). The problem is that if the first SPI message to the EC is sent too soon after reconfiguring the SPI chip-select, it fails. The EC SPI driver prevents back-to-back SPI messages being sent too soon by keeping track of the time the last transfer was sent via the variable 'last_transfer_ns'. To prevent the very first transfer being sent too soon, initialise the 'last_transfer_ns' variable after calling spi_setup() and before sending the first SPI message. Signed-off-by: Jon Hunter Reviewed-by: Brian Norris Reviewed-by: Douglas Anderson Acked-by: Benson Leung Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/cros_ec_spi.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mfd/cros_ec_spi.c b/drivers/mfd/cros_ec_spi.c index c9714072e224..a14196e95e9b 100644 --- a/drivers/mfd/cros_ec_spi.c +++ b/drivers/mfd/cros_ec_spi.c @@ -667,6 +667,7 @@ static int cros_ec_spi_probe(struct spi_device *spi) sizeof(struct ec_response_get_protocol_info); ec_dev->dout_size = sizeof(struct ec_host_request); + ec_spi->last_transfer_ns = ktime_get_ns(); err = cros_ec_register(ec_dev); if (err) { From 6300daa071b21eec1e7411609a09f050966b392e Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Sat, 11 Nov 2017 16:38:43 +0100 Subject: [PATCH 0770/1013] mfd: twl4030-audio: Fix sibling-node lookup commit 0a423772de2f3d7b00899987884f62f63ae00dcb upstream. A helper purported to look up a child node based on its name was using the wrong of-helper and ended up prematurely freeing the parent of-node while leaking any matching node. To make things worse, any matching node would not even necessarily be a child node as the whole device tree was searched depth-first starting at the parent. Fixes: 019a7e6b7b31 ("mfd: twl4030-audio: Add DT support") Signed-off-by: Johan Hovold Acked-by: Peter Ujfalusi Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/twl4030-audio.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/mfd/twl4030-audio.c b/drivers/mfd/twl4030-audio.c index da16bf45fab4..dc94ffc6321a 100644 --- a/drivers/mfd/twl4030-audio.c +++ b/drivers/mfd/twl4030-audio.c @@ -159,13 +159,18 @@ unsigned int twl4030_audio_get_mclk(void) EXPORT_SYMBOL_GPL(twl4030_audio_get_mclk); static bool twl4030_audio_has_codec(struct twl4030_audio_data *pdata, - struct device_node *node) + struct device_node *parent) { + struct device_node *node; + if (pdata && pdata->codec) return true; - if (of_find_node_by_name(node, "codec")) + node = of_get_child_by_name(parent, "codec"); + if (node) { + of_node_put(node); return true; + } return false; } From 637de99c7a1523b31aedb83537b0fcba8775663e Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Sat, 11 Nov 2017 16:38:44 +0100 Subject: [PATCH 0771/1013] mfd: twl6040: Fix child-node lookup commit 85e9b13cbb130a3209f21bd7933933399c389ffe upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent node was prematurely freed, while the child node was leaked. Note that the CONFIG_OF compile guard can be removed as of_get_child_by_name() provides a !CONFIG_OF implementation which always fails. Fixes: 37e13cecaa14 ("mfd: Add support for Device Tree to twl6040") Fixes: ca2cad6ae38e ("mfd: Fix twl6040 build failure") Signed-off-by: Johan Hovold Acked-by: Peter Ujfalusi Signed-off-by: Lee Jones Signed-off-by: Greg Kroah-Hartman --- drivers/mfd/twl6040.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/mfd/twl6040.c b/drivers/mfd/twl6040.c index d66502d36ba0..dd19f17a1b63 100644 --- a/drivers/mfd/twl6040.c +++ b/drivers/mfd/twl6040.c @@ -97,12 +97,16 @@ static struct reg_sequence twl6040_patch[] = { }; -static bool twl6040_has_vibra(struct device_node *node) +static bool twl6040_has_vibra(struct device_node *parent) { -#ifdef CONFIG_OF - if (of_find_node_by_name(node, "vibra")) + struct device_node *node; + + node = of_get_child_by_name(parent, "vibra"); + if (node) { + of_node_put(node); return true; -#endif + } + return false; } From 065a28657370ce3a7ebbc13030ba6231db4a6138 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Thu, 14 Dec 2017 16:44:12 +0100 Subject: [PATCH 0772/1013] ALSA: rawmidi: Avoid racy info ioctl via ctl device commit c1cfd9025cc394fd137a01159d74335c5ac978ce upstream. The rawmidi also allows to obtaining the information via ioctl of ctl API. It means that user can issue an ioctl to the rawmidi device even when it's being removed as long as the control device is present. Although the code has some protection via the global register_mutex, its range is limited to the search of the corresponding rawmidi object, and the mutex is already unlocked at accessing the rawmidi object. This may lead to a use-after-free. For avoiding it, this patch widens the application of register_mutex to the whole snd_rawmidi_info_select() function. We have another mutex per rawmidi object, but this operation isn't very hot path, so it shouldn't matter from the performance POV. Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/core/rawmidi.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/sound/core/rawmidi.c b/sound/core/rawmidi.c index b3b353d72527..f055ca10bbc1 100644 --- a/sound/core/rawmidi.c +++ b/sound/core/rawmidi.c @@ -579,15 +579,14 @@ static int snd_rawmidi_info_user(struct snd_rawmidi_substream *substream, return 0; } -int snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info) +static int __snd_rawmidi_info_select(struct snd_card *card, + struct snd_rawmidi_info *info) { struct snd_rawmidi *rmidi; struct snd_rawmidi_str *pstr; struct snd_rawmidi_substream *substream; - mutex_lock(®ister_mutex); rmidi = snd_rawmidi_search(card, info->device); - mutex_unlock(®ister_mutex); if (!rmidi) return -ENXIO; if (info->stream < 0 || info->stream > 1) @@ -603,6 +602,16 @@ int snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info } return -ENXIO; } + +int snd_rawmidi_info_select(struct snd_card *card, struct snd_rawmidi_info *info) +{ + int ret; + + mutex_lock(®ister_mutex); + ret = __snd_rawmidi_info_select(card, info); + mutex_unlock(®ister_mutex); + return ret; +} EXPORT_SYMBOL(snd_rawmidi_info_select); static int snd_rawmidi_info_select_user(struct snd_card *card, From 7a6a8463978679e3453785270543abadecb07ab7 Mon Sep 17 00:00:00 2001 From: Kailang Yang Date: Thu, 14 Dec 2017 15:28:58 +0800 Subject: [PATCH 0773/1013] ALSA: hda/realtek - Fix Dell AIO LineOut issue commit 9226665159f0367ad08bc7d5dd194aeadb90316f upstream. Dell AIO had LineOut jack. Add LineOut verb into this patch. [ Additional notes: the ALC274 codec seems requiring the fixed pin / DAC connections for HP / line-out pins for enabling EQ for speakers; i.e. the HP / LO pins expect to be connected with NID 0x03 while keeping the speaker with NID 0x02. However, by adding a new line-out pin, the auto-parser assigns the NID 0x02 for HP/LO pins as primary outputs. As an easy workaround, we provide the preferred_pairs[] to map forcibly for these pins. -- tiwai ] Fixes: 75ee94b20b46 ("ALSA: hda - fix headset mic problem for Dell machines with alc274") Signed-off-by: Kailang Yang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index b076386c8952..9ac4b9076ee2 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5162,6 +5162,22 @@ static void alc233_alc662_fixup_lenovo_dual_codecs(struct hda_codec *codec, } } +/* Forcibly assign NID 0x03 to HP/LO while NID 0x02 to SPK for EQ */ +static void alc274_fixup_bind_dacs(struct hda_codec *codec, + const struct hda_fixup *fix, int action) +{ + struct alc_spec *spec = codec->spec; + static hda_nid_t preferred_pairs[] = { + 0x21, 0x03, 0x1b, 0x03, 0x16, 0x02, + 0 + }; + + if (action != HDA_FIXUP_ACT_PRE_PROBE) + return; + + spec->gen.preferred_dacs = preferred_pairs; +} + /* for hda_fixup_thinkpad_acpi() */ #include "thinkpad_helper.c" @@ -5279,6 +5295,8 @@ enum { ALC233_FIXUP_LENOVO_MULTI_CODECS, ALC294_FIXUP_LENOVO_MIC_LOCATION, ALC700_FIXUP_INTEL_REFERENCE, + ALC274_FIXUP_DELL_BIND_DACS, + ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, }; static const struct hda_fixup alc269_fixups[] = { @@ -6089,6 +6107,21 @@ static const struct hda_fixup alc269_fixups[] = { {} } }, + [ALC274_FIXUP_DELL_BIND_DACS] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc274_fixup_bind_dacs, + .chained = true, + .chain_id = ALC269_FIXUP_DELL1_MIC_NO_PRESENCE + }, + [ALC274_FIXUP_DELL_AIO_LINEOUT_VERB] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x1b, 0x0401102f }, + { } + }, + .chained = true, + .chain_id = ALC274_FIXUP_DELL_BIND_DACS + }, }; static const struct snd_pci_quirk alc269_fixup_tbl[] = { @@ -6550,7 +6583,7 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = { {0x14, 0x90170110}, {0x1b, 0x90a70130}, {0x21, 0x03211020}), - SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, {0x12, 0xb7a60130}, {0x13, 0xb8a61140}, {0x16, 0x90170110}, From 70709c277cdcee4e1700e9155a316700ba1446b9 Mon Sep 17 00:00:00 2001 From: Guneshwor Singh Date: Thu, 7 Dec 2017 18:06:20 +0530 Subject: [PATCH 0774/1013] ALSA: hda - Add vendor id for Cannonlake HDMI codec commit 2b4584d00a6bc02b63ab3c7213060d41a74bdff1 upstream. Cannonlake HDMI codec has the same nid as Geminilake. This adds the codec entry for it. Signed-off-by: Guneshwor Singh Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_hdmi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/sound/pci/hda/patch_hdmi.c b/sound/pci/hda/patch_hdmi.c index c19c81d230bd..b4f1b6e88305 100644 --- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -55,10 +55,11 @@ MODULE_PARM_DESC(static_hdmi_pcm, "Don't restrict PCM parameters per ELD info"); #define is_kabylake(codec) ((codec)->core.vendor_id == 0x8086280b) #define is_geminilake(codec) (((codec)->core.vendor_id == 0x8086280d) || \ ((codec)->core.vendor_id == 0x80862800)) +#define is_cannonlake(codec) ((codec)->core.vendor_id == 0x8086280c) #define is_haswell_plus(codec) (is_haswell(codec) || is_broadwell(codec) \ || is_skylake(codec) || is_broxton(codec) \ - || is_kabylake(codec)) || is_geminilake(codec) - + || is_kabylake(codec)) || is_geminilake(codec) \ + || is_cannonlake(codec) #define is_valleyview(codec) ((codec)->core.vendor_id == 0x80862882) #define is_cherryview(codec) ((codec)->core.vendor_id == 0x80862883) #define is_valleyview_plus(codec) (is_valleyview(codec) || is_cherryview(codec)) @@ -3841,6 +3842,7 @@ HDA_CODEC_ENTRY(0x80862808, "Broadwell HDMI", patch_i915_hsw_hdmi), HDA_CODEC_ENTRY(0x80862809, "Skylake HDMI", patch_i915_hsw_hdmi), HDA_CODEC_ENTRY(0x8086280a, "Broxton HDMI", patch_i915_hsw_hdmi), HDA_CODEC_ENTRY(0x8086280b, "Kabylake HDMI", patch_i915_hsw_hdmi), +HDA_CODEC_ENTRY(0x8086280c, "Cannonlake HDMI", patch_i915_glk_hdmi), HDA_CODEC_ENTRY(0x8086280d, "Geminilake HDMI", patch_i915_glk_hdmi), HDA_CODEC_ENTRY(0x80862800, "Geminilake HDMI", patch_i915_glk_hdmi), HDA_CODEC_ENTRY(0x80862880, "CedarTrail HDMI", patch_generic_hdmi), From 39384674586f371503728661d0ded892c348717a Mon Sep 17 00:00:00 2001 From: Jussi Laako Date: Thu, 7 Dec 2017 12:58:33 +0200 Subject: [PATCH 0775/1013] ALSA: usb-audio: Add native DSD support for Esoteric D-05X commit 866f7ed7d67936dcdbcddc111c8af878c918fe7c upstream. Adds VID:PID of Esoteric D-05X to the TEAC device id's. Renames the is_teac_50X_dac() function to is_teac_dsd_dac() to cover broader device family from the same corporation sharing the same USB audio implementation. Signed-off-by: Jussi Laako Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/usb/quirks.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/sound/usb/quirks.c b/sound/usb/quirks.c index 20624320b753..8d7db7cd4f88 100644 --- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -1172,10 +1172,11 @@ static bool is_marantz_denon_dac(unsigned int id) /* TEAC UD-501/UD-503/NT-503 USB DACs need a vendor cmd to switch * between PCM/DOP and native DSD mode */ -static bool is_teac_50X_dac(unsigned int id) +static bool is_teac_dsd_dac(unsigned int id) { switch (id) { case USB_ID(0x0644, 0x8043): /* TEAC UD-501/UD-503/NT-503 */ + case USB_ID(0x0644, 0x8044): /* Esoteric D-05X */ return true; } return false; @@ -1208,7 +1209,7 @@ int snd_usb_select_mode_quirk(struct snd_usb_substream *subs, break; } mdelay(20); - } else if (is_teac_50X_dac(subs->stream->chip->usb_id)) { + } else if (is_teac_dsd_dac(subs->stream->chip->usb_id)) { /* Vendor mode switch cmd is required. */ switch (fmt->altsetting) { case 3: /* DSD mode (DSD_U32) requested */ @@ -1398,7 +1399,7 @@ u64 snd_usb_interface_dsd_format_quirks(struct snd_usb_audio *chip, } /* TEAC devices with USB DAC functionality */ - if (is_teac_50X_dac(chip->usb_id)) { + if (is_teac_dsd_dac(chip->usb_id)) { if (fp->altsetting == 3) return SNDRV_PCM_FMTBIT_DSD_U32_BE; } From 2c7b98ffac666d4f54d1b57b82c3341f473f7dc5 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 18 Dec 2017 23:36:57 +0100 Subject: [PATCH 0776/1013] ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU commit 5a15f289ee87eaf33f13f08a4909ec99d837ec5f upstream. The commit 89b89d121ffc ("ALSA: usb-audio: Add check return value for usb_string()") added the check of the return value from snd_usb_copy_string_desc(), which is correct per se, but it introduced a regression. In the original code, either the "Clock Source", "Playback Source" or "Capture Source" suffix is added after the terminal string, while the commit changed it to add the suffix only when get_term_name() is failing. It ended up with an incorrect ctl name like "PCM" instead of "PCM Capture Source". Also, even the original code has a similar bug: when the ctl name is generated from snd_usb_copy_string_desc() for the given iSelector, it also doesn't put the suffix. This patch addresses these issues: the suffix is added always when no static mapping is found. Also the patch tries to put more comments and cleans up the if/else block for better readability in order to avoid the same pitfall again. Fixes: 89b89d121ffc ("ALSA: usb-audio: Add check return value for usb_string()") Reported-and-tested-by: Mauro Santos Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/usb/mixer.c | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index 4fde4f8d4444..75bce127d768 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2173,20 +2173,25 @@ static int parse_audio_selector_unit(struct mixer_build *state, int unitid, kctl->private_value = (unsigned long)namelist; kctl->private_free = usb_mixer_selector_elem_free; - nameid = uac_selector_unit_iSelector(desc); + /* check the static mapping table at first */ len = check_mapped_name(map, kctl->id.name, sizeof(kctl->id.name)); - if (len) - ; - else if (nameid) - len = snd_usb_copy_string_desc(state, nameid, kctl->id.name, - sizeof(kctl->id.name)); - else - len = get_term_name(state, &state->oterm, - kctl->id.name, sizeof(kctl->id.name), 0); - if (!len) { - strlcpy(kctl->id.name, "USB", sizeof(kctl->id.name)); + /* no mapping ? */ + /* if iSelector is given, use it */ + nameid = uac_selector_unit_iSelector(desc); + if (nameid) + len = snd_usb_copy_string_desc(state, nameid, + kctl->id.name, + sizeof(kctl->id.name)); + /* ... or pick up the terminal name at next */ + if (!len) + len = get_term_name(state, &state->oterm, + kctl->id.name, sizeof(kctl->id.name), 0); + /* ... or use the fixed string "USB" as the last resort */ + if (!len) + strlcpy(kctl->id.name, "USB", sizeof(kctl->id.name)); + /* and add the proper suffix */ if (desc->bDescriptorSubtype == UAC2_CLOCK_SELECTOR) append_ctl_name(kctl, " Clock Source"); else if ((state->oterm.type & 0xff00) == 0x0100) From d8f477a5cd2021552b135764b77df7c39b7dc504 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 15 Dec 2017 03:07:18 +0100 Subject: [PATCH 0777/1013] PCI / PM: Force devices to D0 in pci_pm_thaw_noirq() commit 5839ee7389e893a31e4e3c9cf17b50d14103c902 upstream. It is incorrect to call pci_restore_state() for devices in low-power states (D1-D3), as that involves the restoration of MSI setup which requires MMIO to be operational and that is only the case in D0. However, pci_pm_thaw_noirq() may do that if the driver's "freeze" callbacks put the device into a low-power state, so fix it by making it force devices into D0 via pci_set_power_state() instead of trying to "update" their power state which is pointless. Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation) Reported-by: Thomas Gleixner Reported-by: Maarten Lankhorst Tested-by: Thomas Gleixner Tested-by: Maarten Lankhorst Signed-off-by: Rafael J. Wysocki Acked-by: Bjorn Helgaas Signed-off-by: Greg Kroah-Hartman --- drivers/pci/pci-driver.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c index 11bd267fc137..bb0927de79dd 100644 --- a/drivers/pci/pci-driver.c +++ b/drivers/pci/pci-driver.c @@ -968,7 +968,12 @@ static int pci_pm_thaw_noirq(struct device *dev) if (pci_has_legacy_pm_support(pci_dev)) return pci_legacy_resume_early(dev); - pci_update_current_state(pci_dev, PCI_D0); + /* + * pci_restore_state() requires the device to be in D0 (because of MSI + * restoration among other things), so force it into D0 in case the + * driver's "freeze" callbacks put it into a low-power state directly. + */ + pci_set_power_state(pci_dev, PCI_D0); pci_restore_state(pci_dev); if (drv && drv->pm && drv->pm->thaw_noirq) From f349652293fb79fbd5768dfc18605200177b74d7 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 20 Dec 2017 13:13:58 -0700 Subject: [PATCH 0778/1013] block: unalign call_single_data in struct request commit 4ccafe032005e9b96acbef2e389a4de5b1254add upstream. A previous change blindly added massive alignment to the call_single_data structure in struct request. This ballooned it in size from 296 to 320 bytes on my setup, for no valid reason at all. Use the unaligned struct __call_single_data variant instead. Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data") Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- include/linux/blkdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 8da66379f7ea..fd47bd96b5d3 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -135,7 +135,7 @@ typedef __u32 __bitwise req_flags_t; struct request { struct list_head queuelist; union { - call_single_data_t csd; + struct __call_single_data csd; u64 fifo_time; }; From 3ef1c33f98c82876555813e301aca7a765f28f92 Mon Sep 17 00:00:00 2001 From: Shaohua Li Date: Wed, 20 Dec 2017 11:10:17 -0700 Subject: [PATCH 0779/1013] block-throttle: avoid double charge commit 111be883981748acc9a56e855c8336404a8e787c upstream. If a bio is throttled and split after throttling, the bio could be resubmited and enters the throttling again. This will cause part of the bio to be charged multiple times. If the cgroup has an IO limit, the double charge will significantly harm the performance. The bio split becomes quite common after arbitrary bio size change. To fix this, we always set the BIO_THROTTLED flag if a bio is throttled. If the bio is cloned/split, we copy the flag to new bio too to avoid a double charge. However, cloned bio could be directed to a new disk, keeping the flag be a problem. The observation is we always set new disk for the bio in this case, so we can clear the flag in bio_set_dev(). This issue exists for a long time, arbitrary bio size change just makes it worse, so this should go into stable at least since v4.2. V1-> V2: Not add extra field in bio based on discussion with Tejun Cc: Vivek Goyal Acked-by: Tejun Heo Signed-off-by: Shaohua Li Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- block/bio.c | 2 ++ block/blk-throttle.c | 8 +------- include/linux/bio.h | 2 ++ include/linux/blk_types.h | 9 ++++----- 4 files changed, 9 insertions(+), 12 deletions(-) diff --git a/block/bio.c b/block/bio.c index 33fa6b4af312..7f978eac9a7a 100644 --- a/block/bio.c +++ b/block/bio.c @@ -599,6 +599,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src) bio->bi_disk = bio_src->bi_disk; bio->bi_partno = bio_src->bi_partno; bio_set_flag(bio, BIO_CLONED); + if (bio_flagged(bio_src, BIO_THROTTLED)) + bio_set_flag(bio, BIO_THROTTLED); bio->bi_opf = bio_src->bi_opf; bio->bi_write_hint = bio_src->bi_write_hint; bio->bi_iter = bio_src->bi_iter; diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 8631763866c6..a8cd7b3d9647 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -2223,13 +2223,7 @@ bool blk_throtl_bio(struct request_queue *q, struct blkcg_gq *blkg, out_unlock: spin_unlock_irq(q->queue_lock); out: - /* - * As multiple blk-throtls may stack in the same issue path, we - * don't want bios to leave with the flag set. Clear the flag if - * being issued. - */ - if (!throttled) - bio_clear_flag(bio, BIO_THROTTLED); + bio_set_flag(bio, BIO_THROTTLED); #ifdef CONFIG_BLK_DEV_THROTTLING_LOW if (throttled || !td->track_bio_latency) diff --git a/include/linux/bio.h b/include/linux/bio.h index 275c91c99516..45f00dd6323c 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -504,6 +504,8 @@ extern unsigned int bvec_nr_vecs(unsigned short idx); #define bio_set_dev(bio, bdev) \ do { \ + if ((bio)->bi_disk != (bdev)->bd_disk) \ + bio_clear_flag(bio, BIO_THROTTLED);\ (bio)->bi_disk = (bdev)->bd_disk; \ (bio)->bi_partno = (bdev)->bd_partno; \ } while (0) diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 96ac3815542c..1c8a8a2aedf7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -50,8 +50,6 @@ struct blk_issue_stat { struct bio { struct bio *bi_next; /* request queue link */ struct gendisk *bi_disk; - u8 bi_partno; - blk_status_t bi_status; unsigned int bi_opf; /* bottom bits req flags, * top bits REQ_OP. Use * accessors. @@ -59,8 +57,8 @@ struct bio { unsigned short bi_flags; /* status, etc and bvec pool number */ unsigned short bi_ioprio; unsigned short bi_write_hint; - - struct bvec_iter bi_iter; + blk_status_t bi_status; + u8 bi_partno; /* Number of segments in this BIO after * physical address coalescing is performed. @@ -74,8 +72,9 @@ struct bio { unsigned int bi_seg_front_size; unsigned int bi_seg_back_size; - atomic_t __bi_remaining; + struct bvec_iter bi_iter; + atomic_t __bi_remaining; bio_end_io_t *bi_end_io; void *bi_private; From 482b6942a8c1e5110a44787f6e72e0d0775bf4c0 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Tue, 12 Dec 2017 21:25:41 +0100 Subject: [PATCH 0780/1013] parisc: Align os_hpmc_size on word boundary commit 0ed9d3de5f8f97e6efd5ca0e3377cab5f0451ead upstream. The os_hpmc_size variable sometimes wasn't aligned at word boundary and thus triggered the unaligned fault handler at startup. Fix it by aligning it properly. Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/kernel/hpmc.S | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/parisc/kernel/hpmc.S b/arch/parisc/kernel/hpmc.S index e3a8e5e4d5de..8d072c44f300 100644 --- a/arch/parisc/kernel/hpmc.S +++ b/arch/parisc/kernel/hpmc.S @@ -305,6 +305,7 @@ ENDPROC_CFI(os_hpmc) __INITRODATA + .align 4 .export os_hpmc_size os_hpmc_size: .word .os_hpmc_end-.os_hpmc From 117b8b85e5777b76128a808533c4f711be564360 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Tue, 12 Dec 2017 21:32:16 +0100 Subject: [PATCH 0781/1013] parisc: Fix indenting in puts() commit 203c110b39a89b48156c7450504e454fedb7f7f6 upstream. Static analysis tools complain that we intended to have curly braces around this indent block. In this case this assumption is wrong, so fix the indenting. Fixes: 2f3c7b8137ef ("parisc: Add core code for self-extracting kernel") Reported-by: Dan Carpenter Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/boot/compressed/misc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/parisc/boot/compressed/misc.c b/arch/parisc/boot/compressed/misc.c index 9345b44b86f0..f57118e1f6b4 100644 --- a/arch/parisc/boot/compressed/misc.c +++ b/arch/parisc/boot/compressed/misc.c @@ -123,8 +123,8 @@ int puts(const char *s) while ((nuline = strchr(s, '\n')) != NULL) { if (nuline != s) pdc_iodc_print(s, nuline - s); - pdc_iodc_print("\r\n", 2); - s = nuline + 1; + pdc_iodc_print("\r\n", 2); + s = nuline + 1; } if (*s != '\0') pdc_iodc_print(s, strlen(s)); From 13a41fbd867ae482650aab1cee068e1f7f5ad94a Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Tue, 12 Dec 2017 21:52:26 +0100 Subject: [PATCH 0782/1013] parisc: Hide Diva-built-in serial aux and graphics card commit bcf3f1752a622f1372d3252d0fea8855d89812e7 upstream. Diva GSP card has built-in serial AUX port and ATI graphic card which simply don't work and which both don't have external connectors. User Guides even mention that those devices shouldn't be used. So, prevent that Linux drivers try to enable those devices. Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- drivers/parisc/lba_pci.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/drivers/parisc/lba_pci.c b/drivers/parisc/lba_pci.c index a25fed52f7e9..41b740aed3a3 100644 --- a/drivers/parisc/lba_pci.c +++ b/drivers/parisc/lba_pci.c @@ -1692,3 +1692,36 @@ void lba_set_iregs(struct parisc_device *lba, u32 ibase, u32 imask) iounmap(base_addr); } + +/* + * The design of the Diva management card in rp34x0 machines (rp3410, rp3440) + * seems rushed, so that many built-in components simply don't work. + * The following quirks disable the serial AUX port and the built-in ATI RV100 + * Radeon 7000 graphics card which both don't have any external connectors and + * thus are useless, and even worse, e.g. the AUX port occupies ttyS0 and as + * such makes those machines the only PARISC machines on which we can't use + * ttyS0 as boot console. + */ +static void quirk_diva_ati_card(struct pci_dev *dev) +{ + if (dev->subsystem_vendor != PCI_VENDOR_ID_HP || + dev->subsystem_device != 0x1292) + return; + + dev_info(&dev->dev, "Hiding Diva built-in ATI card"); + dev->device = 0; +} +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_RADEON_QY, + quirk_diva_ati_card); + +static void quirk_diva_aux_disable(struct pci_dev *dev) +{ + if (dev->subsystem_vendor != PCI_VENDOR_ID_HP || + dev->subsystem_device != 0x1291) + return; + + dev_info(&dev->dev, "Hiding Diva built-in AUX serial device"); + dev->device = 0; +} +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HP, PCI_DEVICE_ID_HP_DIVA_AUX, + quirk_diva_aux_disable); From 1b1f78c02c79d8e5398d3a0cbf45c4a3dee2e9ee Mon Sep 17 00:00:00 2001 From: John David Anglin Date: Mon, 13 Nov 2017 19:35:33 -0500 Subject: [PATCH 0783/1013] Revert "parisc: Re-enable interrupts early" commit 9352aeada4d8d8753fc0e414fbfe8fdfcb68a12c upstream. This reverts commit 5c38602d83e584047906b41b162ababd4db4106d. Interrupts can't be enabled early because the register saves are done on the thread stack prior to switching to the IRQ stack. This caused stack overflows and the thread stack needed increasing to 32k. Even then, stack overflows still occasionally occurred. Background: Even with a 32 kB thread stack, I have seen instances where the thread stack overflowed on the mx3210 buildd. Detection of stack overflow only occurs when we have an external interrupt. When an external interrupt occurs, we switch to the thread stack if we are not already on a kernel stack. Then, registers and specials are saved to the kernel stack. The bug occurs in intr_return where interrupts are reenabled prior to returning from the interrupt. This was done incase we need to schedule or deliver signals. However, it introduces the possibility that multiple external interrupts may occur on the thread stack and cause a stack overflow. These might not be detected and cause the kernel to misbehave in random ways. This patch changes the code back to only reenable interrupts when we are going to schedule or deliver signals. As a result, we generally return from an interrupt before reenabling interrupts. This minimizes the growth of the thread stack. Fixes: 5c38602d83e5 ("parisc: Re-enable interrupts early") Signed-off-by: John David Anglin Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/kernel/entry.S | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index a4fd296c958e..f3cecf5117cf 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -878,9 +878,6 @@ ENTRY_CFI(syscall_exit_rfi) STREG %r19,PT_SR7(%r16) intr_return: - /* NOTE: Need to enable interrupts incase we schedule. */ - ssm PSW_SM_I, %r0 - /* check for reschedule */ mfctl %cr30,%r1 LDREG TI_FLAGS(%r1),%r19 /* sched.h: TIF_NEED_RESCHED */ @@ -907,6 +904,11 @@ intr_check_sig: LDREG PT_IASQ1(%r16), %r20 cmpib,COND(=),n 0,%r20,intr_restore /* backward */ + /* NOTE: We need to enable interrupts if we have to deliver + * signals. We used to do this earlier but it caused kernel + * stack overflows. */ + ssm PSW_SM_I, %r0 + copy %r0, %r25 /* long in_syscall = 0 */ #ifdef CONFIG_64BIT ldo -16(%r30),%r29 /* Reference param save area */ @@ -958,6 +960,10 @@ intr_do_resched: cmpib,COND(=) 0, %r20, intr_do_preempt nop + /* NOTE: We need to enable interrupts if we schedule. We used + * to do this earlier but it caused kernel stack overflows. */ + ssm PSW_SM_I, %r0 + #ifdef CONFIG_64BIT ldo -16(%r30),%r29 /* Reference param save area */ #endif From e8f28db0f737690e3d7ecb26cfaf60c7af74f7e5 Mon Sep 17 00:00:00 2001 From: Ricardo Ribalda Delgado Date: Tue, 21 Nov 2017 10:09:02 +0100 Subject: [PATCH 0784/1013] spi: xilinx: Detect stall with Unknown commands commit 5a1314fa697fc65cefaba64cd4699bfc3e6882a6 upstream. When the core is configured in C_SPI_MODE > 0, it integrates a lookup table that automatically configures the core in dual or quad mode based on the command (first byte on the tx fifo). Unfortunately, that list mode_?_memoy_*.mif does not contain all the supported commands by the flash. Since 4.14 spi-nor automatically tries to probe the flash using SFDP (command 0x5a), and that command is not part of the list_mode table. Whit the right combination of C_SPI_MODE and C_SPI_MEMORY this leads into a stall that can only be recovered with a soft rest. This patch detects this kind of stall and returns -EIO to the caller on those commands. spi-nor can handle this error properly: m25p80 spi0.0: Detected stall. Check C_SPI_MODE and C_SPI_MEMORY. 0x21 0x2404 m25p80 spi0.0: SPI transfer failed: -5 spi_master spi0: failed to transfer one message from queue m25p80 spi0.0: s25sl064p (8192 Kbytes) Signed-off-by: Ricardo Ribalda Delgado Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- drivers/spi/spi-xilinx.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/drivers/spi/spi-xilinx.c b/drivers/spi/spi-xilinx.c index bc7100b93dfc..e0b9fe1d0e37 100644 --- a/drivers/spi/spi-xilinx.c +++ b/drivers/spi/spi-xilinx.c @@ -271,6 +271,7 @@ static int xilinx_spi_txrx_bufs(struct spi_device *spi, struct spi_transfer *t) while (remaining_words) { int n_words, tx_words, rx_words; u32 sr; + int stalled; n_words = min(remaining_words, xspi->buffer_size); @@ -299,7 +300,17 @@ static int xilinx_spi_txrx_bufs(struct spi_device *spi, struct spi_transfer *t) /* Read out all the data from the Rx FIFO */ rx_words = n_words; + stalled = 10; while (rx_words) { + if (rx_words == n_words && !(stalled--) && + !(sr & XSPI_SR_TX_EMPTY_MASK) && + (sr & XSPI_SR_RX_EMPTY_MASK)) { + dev_err(&spi->dev, + "Detected stall. Check C_SPI_MODE and C_SPI_MEMORY\n"); + xspi_init_hw(xspi); + return -EIO; + } + if ((sr & XSPI_SR_TX_EMPTY_MASK) && (rx_words > 1)) { xilinx_spi_rx(xspi); rx_words--; From ed1918e287547d92a795729993ce8e6ca7940eda Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Mon, 27 Nov 2017 15:16:32 +0100 Subject: [PATCH 0785/1013] spi: a3700: Fix clk prescaling for coefficient over 15 commit 251c201bf4f8b5bf4f1ccb4f8920eed2e1f57580 upstream. The Armada 3700 SPI controller has 2 ranges of prescaler coefficients. One ranging from 0 to 15 by steps of 1, and one ranging from 0 to 30 by steps of 2. This commit fixes the prescaler coefficients that are over 15 so that it uses the correct range of values. The prescaling coefficient is rounded to the upper value if it is odd. This was tested on Espressobin with spidev and a locigal analyser. Signed-off-by: Maxime Chevallier Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- drivers/spi/spi-armada-3700.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/spi/spi-armada-3700.c b/drivers/spi/spi-armada-3700.c index 568e1c65aa82..fe3fa1e8517a 100644 --- a/drivers/spi/spi-armada-3700.c +++ b/drivers/spi/spi-armada-3700.c @@ -79,6 +79,7 @@ #define A3700_SPI_BYTE_LEN BIT(5) #define A3700_SPI_CLK_PRESCALE BIT(0) #define A3700_SPI_CLK_PRESCALE_MASK (0x1f) +#define A3700_SPI_CLK_EVEN_OFFS (0x10) #define A3700_SPI_WFIFO_THRS_BIT 28 #define A3700_SPI_RFIFO_THRS_BIT 24 @@ -220,6 +221,13 @@ static void a3700_spi_clock_set(struct a3700_spi *a3700_spi, prescale = DIV_ROUND_UP(clk_get_rate(a3700_spi->clk), speed_hz); + /* For prescaler values over 15, we can only set it by steps of 2. + * Starting from A3700_SPI_CLK_EVEN_OFFS, we set values from 0 up to + * 30. We only use this range from 16 to 30. + */ + if (prescale > 15) + prescale = A3700_SPI_CLK_EVEN_OFFS + DIV_ROUND_UP(prescale, 2); + val = spireg_read(a3700_spi, A3700_SPI_IF_CFG_REG); val = val & ~A3700_SPI_CLK_PRESCALE_MASK; From efc9b7ae524dfbd4cb9122d2f1d4e9c33d0f11dd Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Mon, 4 Dec 2017 12:11:02 +0300 Subject: [PATCH 0786/1013] pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems commit d2b3c353595a855794f8b9df5b5bdbe8deb0c413 upstream. Guenter Roeck reported an interrupt storm on a prototype system which is based on Cyan Chromebook. The root cause turned out to be a incorrectly configured pin that triggers spurious interrupts. This will be fixed in coreboot but currently we need to prevent the interrupt storm from happening by masking all interrupts (but not GPEs) on those systems. Link: https://bugzilla.kernel.org/show_bug.cgi?id=197953 Fixes: bcb48cca23ec ("pinctrl: cherryview: Do not mask all interrupts in probe") Reported-and-tested-by: Guenter Roeck Reported-by: Dmitry Torokhov Signed-off-by: Mika Westerberg Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman --- drivers/pinctrl/intel/pinctrl-cherryview.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/pinctrl/intel/pinctrl-cherryview.c b/drivers/pinctrl/intel/pinctrl-cherryview.c index fadbca907c7c..0907531a02ca 100644 --- a/drivers/pinctrl/intel/pinctrl-cherryview.c +++ b/drivers/pinctrl/intel/pinctrl-cherryview.c @@ -1620,6 +1620,22 @@ static int chv_gpio_probe(struct chv_pinctrl *pctrl, int irq) clear_bit(i, chip->irq_valid_mask); } + /* + * The same set of machines in chv_no_valid_mask[] have incorrectly + * configured GPIOs that generate spurious interrupts so we use + * this same list to apply another quirk for them. + * + * See also https://bugzilla.kernel.org/show_bug.cgi?id=197953. + */ + if (!need_valid_mask) { + /* + * Mask all interrupts the community is able to generate + * but leave the ones that can only generate GPEs unmasked. + */ + chv_writel(GENMASK(31, pctrl->community->nirqs), + pctrl->regs + CHV_INTMASK); + } + /* Clear all interrupts */ chv_writel(0xffff, pctrl->regs + CHV_INTSTAT); From 8a74c8e87e46fa85fbd0c73273c831559c25d576 Mon Sep 17 00:00:00 2001 From: Julien Thierry Date: Wed, 6 Dec 2017 17:09:49 +0000 Subject: [PATCH 0787/1013] arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu commit bfe766cf65fb65e68c4764f76158718560bdcee5 upstream. When VHE is not present, KVM needs to save and restores PMSCR_EL1 when possible. If SPE is used by the host, value of PMSCR_EL1 cannot be saved for the guest. If the host starts using SPE between two save+restore on the same vcpu, restore will write the value of PMSCR_EL1 read during the first save. Make sure __debug_save_spe_nvhe clears the value of the saved PMSCR_EL1 when the guest cannot use SPE. Signed-off-by: Julien Thierry Cc: Christoffer Dall Cc: Marc Zyngier Cc: Catalin Marinas Reviewed-by: Will Deacon Reviewed-by: Christoffer Dall Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- arch/arm64/kvm/hyp/debug-sr.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm64/kvm/hyp/debug-sr.c b/arch/arm64/kvm/hyp/debug-sr.c index f5154ed3da6c..2add22699764 100644 --- a/arch/arm64/kvm/hyp/debug-sr.c +++ b/arch/arm64/kvm/hyp/debug-sr.c @@ -84,6 +84,9 @@ static void __hyp_text __debug_save_spe_nvhe(u64 *pmscr_el1) { u64 reg; + /* Clear pmscr in case of early return */ + *pmscr_el1 = 0; + /* SPE present on this CPU? */ if (!cpuid_feature_extract_unsigned_field(read_sysreg(id_aa64dfr0_el1), ID_AA64DFR0_PMSVER_SHIFT)) From 4e9cca9267feb13658a3bb6785a5649b76e684e0 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 7 Dec 2017 11:45:45 +0000 Subject: [PATCH 0788/1013] KVM: arm/arm64: Fix HYP unmapping going off limits commit 7839c672e58bf62da8f2f0197fefb442c02ba1dd upstream. When we unmap the HYP memory, we try to be clever and unmap one PGD at a time. If we start with a non-PGD aligned address and try to unmap a whole PGD, things go horribly wrong in unmap_hyp_range (addr and end can never match, and it all goes really badly as we keep incrementing pgd and parse random memory as page tables...). The obvious fix is to let unmap_hyp_range do what it does best, which is to iterate over a range. The size of the linear mapping, which begins at PAGE_OFFSET, can be easily calculated by subtracting PAGE_OFFSET form high_memory, because high_memory is defined as the linear map address of the last byte of DRAM, plus one. The size of the vmalloc region is given trivially by VMALLOC_END - VMALLOC_START. Reported-by: Andre Przywara Tested-by: Andre Przywara Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier Signed-off-by: Christoffer Dall Signed-off-by: Greg Kroah-Hartman --- virt/kvm/arm/mmu.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c index b36945d49986..b4b69c2d1012 100644 --- a/virt/kvm/arm/mmu.c +++ b/virt/kvm/arm/mmu.c @@ -509,8 +509,6 @@ static void unmap_hyp_range(pgd_t *pgdp, phys_addr_t start, u64 size) */ void free_hyp_pgds(void) { - unsigned long addr; - mutex_lock(&kvm_hyp_pgd_mutex); if (boot_hyp_pgd) { @@ -521,10 +519,10 @@ void free_hyp_pgds(void) if (hyp_pgd) { unmap_hyp_range(hyp_pgd, hyp_idmap_start, PAGE_SIZE); - for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE) - unmap_hyp_range(hyp_pgd, kern_hyp_va(addr), PGDIR_SIZE); - for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE) - unmap_hyp_range(hyp_pgd, kern_hyp_va(addr), PGDIR_SIZE); + unmap_hyp_range(hyp_pgd, kern_hyp_va(PAGE_OFFSET), + (uintptr_t)high_memory - PAGE_OFFSET); + unmap_hyp_range(hyp_pgd, kern_hyp_va(VMALLOC_START), + VMALLOC_END - VMALLOC_START); free_pages((unsigned long)hyp_pgd, hyp_pgd_order); hyp_pgd = NULL; From 8708f682836827e8c593e370430a3156766b2fe2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Date: Tue, 12 Dec 2017 12:02:04 +0000 Subject: [PATCH 0789/1013] KVM: PPC: Book3S: fix XIVE migration of pending interrupts MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit dc1c4165d189350cb51bdd3057deb6ecd164beda upstream. When restoring a pending interrupt, we are setting the Q bit to force a retrigger in xive_finish_unmask(). But we also need to force an EOI in this case to reach the same initial state : P=1, Q=0. This can be done by not setting 'old_p' for pending interrupts which will inform xive_finish_unmask() that an EOI needs to be sent. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Suggested-by: Benjamin Herrenschmidt Signed-off-by: Cédric Le Goater Reviewed-by: Laurent Vivier Tested-by: Laurent Vivier Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kvm/book3s_xive.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index bf457843e032..b5e6d227a034 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -1558,7 +1558,7 @@ static int xive_set_source(struct kvmppc_xive *xive, long irq, u64 addr) /* * Restore P and Q. If the interrupt was pending, we - * force both P and Q, which will trigger a resend. + * force Q and !P, which will trigger a resend. * * That means that a guest that had both an interrupt * pending (queued) and Q set will restore with only @@ -1566,7 +1566,7 @@ static int xive_set_source(struct kvmppc_xive *xive, long irq, u64 addr) * is perfectly fine as coalescing interrupts that haven't * been presented yet is always allowed. */ - if (val & KVM_XICS_PRESENTED || val & KVM_XICS_PENDING) + if (val & KVM_XICS_PRESENTED && !(val & KVM_XICS_PENDING)) state->old_p = true; if (val & KVM_XICS_QUEUED || val & KVM_XICS_PENDING) state->old_q = true; From 5aa30b450a8b8b6c8a0921e00e900c0ab9b8c4ae Mon Sep 17 00:00:00 2001 From: Laurent Vivier Date: Tue, 12 Dec 2017 18:23:56 +0100 Subject: [PATCH 0790/1013] KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp() commit 7333b5aca412d6ad02667b5a513485838a91b136 upstream. When we migrate a VM from a POWER8 host (XICS) to a POWER9 host (XICS-on-XIVE), we have an error: qemu-kvm: Unable to restore KVM interrupt controller state \ (0xff000000) for CPU 0: Invalid argument This is because kvmppc_xics_set_icp() checks the new state is internaly consistent, and especially: ... 1129 if (xisr == 0) { 1130 if (pending_pri != 0xff) 1131 return -EINVAL; ... On the other side, kvmppc_xive_get_icp() doesn't set neither the pending_pri value, nor the xisr value (set to 0) (and kvmppc_xive_set_icp() ignores the pending_pri value) As xisr is 0, pending_pri must be set to 0xff. Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Signed-off-by: Laurent Vivier Acked-by: Benjamin Herrenschmidt Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/kvm/book3s_xive.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c index b5e6d227a034..0d750d274c4e 100644 --- a/arch/powerpc/kvm/book3s_xive.c +++ b/arch/powerpc/kvm/book3s_xive.c @@ -725,7 +725,8 @@ u64 kvmppc_xive_get_icp(struct kvm_vcpu *vcpu) /* Return the per-cpu state for state saving/migration */ return (u64)xc->cppr << KVM_REG_PPC_ICP_CPPR_SHIFT | - (u64)xc->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT; + (u64)xc->mfrr << KVM_REG_PPC_ICP_MFRR_SHIFT | + (u64)0xff << KVM_REG_PPC_ICP_PPRI_SHIFT; } int kvmppc_xive_set_icp(struct kvm_vcpu *vcpu, u64 icpval) From 41e1386388dc53bc0b9d7744fd1a99567d737fed Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Mon, 4 Dec 2017 22:21:30 -0800 Subject: [PATCH 0791/1013] KVM: MMU: Fix infinite loop when there is no available mmu page MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit ed52870f4676489124d8697fd00e6ae6c504e586 upstream. The below test case can cause infinite loop in kvm when ept=0. #include #include #include #include #include #include #include long r[5]; int main() { r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); ioctl(r[4], KVM_RUN, 0); } It doesn't setup the memory regions, mmu_alloc_shadow/direct_roots() in kvm return 1 when kvm fails to allocate root page table which can result in beblow infinite loop: vcpu_run() { for (;;) { r = vcpu_enter_guest()::kvm_mmu_reload() returns 1 if (r <= 0) break; if (need_resched()) cond_resched(); } } This patch fixes it by returning -ENOSPC when there is no available kvm mmu page for root page table. Cc: Paolo Bonzini Cc: Radim Krčmář Fixes: 26eeb53cf0f (KVM: MMU: Bail out immediately if there is no available mmu page) Signed-off-by: Wanpeng Li Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/mmu.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 13ebeedcec07..0fce8d73403c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3382,7 +3382,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) spin_lock(&vcpu->kvm->mmu_lock); if(make_mmu_pages_available(vcpu) < 0) { spin_unlock(&vcpu->kvm->mmu_lock); - return 1; + return -ENOSPC; } sp = kvm_mmu_get_page(vcpu, 0, 0, vcpu->arch.mmu.shadow_root_level, 1, ACC_ALL); @@ -3397,7 +3397,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) spin_lock(&vcpu->kvm->mmu_lock); if (make_mmu_pages_available(vcpu) < 0) { spin_unlock(&vcpu->kvm->mmu_lock); - return 1; + return -ENOSPC; } sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); @@ -3437,7 +3437,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) spin_lock(&vcpu->kvm->mmu_lock); if (make_mmu_pages_available(vcpu) < 0) { spin_unlock(&vcpu->kvm->mmu_lock); - return 1; + return -ENOSPC; } sp = kvm_mmu_get_page(vcpu, root_gfn, 0, vcpu->arch.mmu.shadow_root_level, 0, ACC_ALL); @@ -3474,7 +3474,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) spin_lock(&vcpu->kvm->mmu_lock); if (make_mmu_pages_available(vcpu) < 0) { spin_unlock(&vcpu->kvm->mmu_lock); - return 1; + return -ENOSPC; } sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, 0, ACC_ALL); From 6cc3f6f10240df375f6d7457187b467c1817bdad Mon Sep 17 00:00:00 2001 From: Wanpeng Li Date: Thu, 7 Dec 2017 00:30:08 -0800 Subject: [PATCH 0792/1013] KVM: X86: Fix load RFLAGS w/o the fixed bit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit d73235d17ba63b53dc0e1051dbc10a1f1be91b71 upstream. *** Guest State *** CR0: actual=0x0000000000000030, shadow=0x0000000060000010, gh_mask=fffffffffffffff7 CR4: actual=0x0000000000002050, shadow=0x0000000000000000, gh_mask=ffffffffffffe871 CR3 = 0x00000000fffbc000 RSP = 0x0000000000000000 RIP = 0x0000000000000000 RFLAGS=0x00000000 DR7 = 0x0000000000000400 ^^^^^^^^^^ The failed vmentry is triggered by the following testcase when ept=Y: #include #include #include #include #include #include #include long r[5]; int main() { r[2] = open("/dev/kvm", O_RDONLY); r[3] = ioctl(r[2], KVM_CREATE_VM, 0); r[4] = ioctl(r[3], KVM_CREATE_VCPU, 7); struct kvm_regs regs = { .rflags = 0, }; ioctl(r[4], KVM_SET_REGS, ®s); ioctl(r[4], KVM_RUN, 0); } X86 RFLAGS bit 1 is fixed set, userspace can simply clearing bit 1 of RFLAGS with KVM_SET_REGS ioctl which results in vmentry fails. This patch fixes it by oring X86_EFLAGS_FIXED during ioctl. Suggested-by: Jim Mattson Reviewed-by: David Hildenbrand Reviewed-by: Quan Xu Cc: Paolo Bonzini Cc: Radim Krčmář Cc: Jim Mattson Signed-off-by: Wanpeng Li Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/x86.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index df62cdc7a258..075619a92ce7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -7359,7 +7359,7 @@ int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) #endif kvm_rip_write(vcpu, regs->rip); - kvm_set_rflags(vcpu, regs->rflags); + kvm_set_rflags(vcpu, regs->rflags | X86_EFLAGS_FIXED); vcpu->arch.exception.pending = false; From 6461005967ed1a92ad67f8d864fdb3f0794a8682 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 21 Dec 2017 00:49:14 +0100 Subject: [PATCH 0793/1013] kvm: x86: fix RSM when PCID is non-zero commit fae1a3e775cca8c3a9e0eb34443b310871a15a92 upstream. rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then CR4 & ~PCIDE, then CR0, then CR4. However, setting CR4.PCIDE fails if CR3[11:0] != 0. It's probably easier in the long run to replace rsm_enter_protected_mode() with an emulator callback that sets all the special registers (like KVM_SET_SREGS would do). For now, set the PCID field of CR3 only after CR4.PCIDE is 1. Reported-by: Laszlo Ersek Tested-by: Laszlo Ersek Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c Signed-off-by: Paolo Bonzini Signed-off-by: Greg Kroah-Hartman --- arch/x86/kvm/emulate.c | 32 +++++++++++++++++++++++++------- 1 file changed, 25 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index d90cdc77e077..7bbb5da2b49d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2404,9 +2404,21 @@ static int rsm_load_seg_64(struct x86_emulate_ctxt *ctxt, u64 smbase, int n) } static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt, - u64 cr0, u64 cr4) + u64 cr0, u64 cr3, u64 cr4) { int bad; + u64 pcid; + + /* In order to later set CR4.PCIDE, CR3[11:0] must be zero. */ + pcid = 0; + if (cr4 & X86_CR4_PCIDE) { + pcid = cr3 & 0xfff; + cr3 &= ~0xfff; + } + + bad = ctxt->ops->set_cr(ctxt, 3, cr3); + if (bad) + return X86EMUL_UNHANDLEABLE; /* * First enable PAE, long mode needs it before CR0.PG = 1 is set. @@ -2425,6 +2437,12 @@ static int rsm_enter_protected_mode(struct x86_emulate_ctxt *ctxt, bad = ctxt->ops->set_cr(ctxt, 4, cr4); if (bad) return X86EMUL_UNHANDLEABLE; + if (pcid) { + bad = ctxt->ops->set_cr(ctxt, 3, cr3 | pcid); + if (bad) + return X86EMUL_UNHANDLEABLE; + } + } return X86EMUL_CONTINUE; @@ -2435,11 +2453,11 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt, u64 smbase) struct desc_struct desc; struct desc_ptr dt; u16 selector; - u32 val, cr0, cr4; + u32 val, cr0, cr3, cr4; int i; cr0 = GET_SMSTATE(u32, smbase, 0x7ffc); - ctxt->ops->set_cr(ctxt, 3, GET_SMSTATE(u32, smbase, 0x7ff8)); + cr3 = GET_SMSTATE(u32, smbase, 0x7ff8); ctxt->eflags = GET_SMSTATE(u32, smbase, 0x7ff4) | X86_EFLAGS_FIXED; ctxt->_eip = GET_SMSTATE(u32, smbase, 0x7ff0); @@ -2481,14 +2499,14 @@ static int rsm_load_state_32(struct x86_emulate_ctxt *ctxt, u64 smbase) ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smbase, 0x7ef8)); - return rsm_enter_protected_mode(ctxt, cr0, cr4); + return rsm_enter_protected_mode(ctxt, cr0, cr3, cr4); } static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase) { struct desc_struct desc; struct desc_ptr dt; - u64 val, cr0, cr4; + u64 val, cr0, cr3, cr4; u32 base3; u16 selector; int i, r; @@ -2505,7 +2523,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase) ctxt->ops->set_dr(ctxt, 7, (val & DR7_VOLATILE) | DR7_FIXED_1); cr0 = GET_SMSTATE(u64, smbase, 0x7f58); - ctxt->ops->set_cr(ctxt, 3, GET_SMSTATE(u64, smbase, 0x7f50)); + cr3 = GET_SMSTATE(u64, smbase, 0x7f50); cr4 = GET_SMSTATE(u64, smbase, 0x7f48); ctxt->ops->set_smbase(ctxt, GET_SMSTATE(u32, smbase, 0x7f00)); val = GET_SMSTATE(u64, smbase, 0x7ed0); @@ -2533,7 +2551,7 @@ static int rsm_load_state_64(struct x86_emulate_ctxt *ctxt, u64 smbase) dt.address = GET_SMSTATE(u64, smbase, 0x7e68); ctxt->ops->set_gdt(ctxt, &dt); - r = rsm_enter_protected_mode(ctxt, cr0, cr4); + r = rsm_enter_protected_mode(ctxt, cr0, cr3, cr4); if (r != X86EMUL_CONTINUE) return r; From e2d769198ff7cb3eecb60a427706d498d87aa0a0 Mon Sep 17 00:00:00 2001 From: Chen-Yu Tsai Date: Mon, 18 Dec 2017 11:57:51 +0800 Subject: [PATCH 0794/1013] clk: sunxi: sun9i-mmc: Implement reset callback for reset controls commit 61d2f2a05765a5f57149efbd93e3e81a83cbc2c1 upstream. Our MMC host driver now issues a reset, instead of just deasserting the reset control, since commit c34eda69ad4c ("mmc: sunxi: Reset the device at probe time"). The sun9i-mmc clock driver does not support this, and will fail, which results in MMC not probing. This patch implements the reset callback by asserting the reset control, then deasserting it after a small delay. Fixes: 7a6fca879f59 ("clk: sunxi: Add driver for A80 MMC config clocks/resets") Signed-off-by: Chen-Yu Tsai Acked-by: Philipp Zabel Acked-by: Maxime Ripard Signed-off-by: Michael Turquette Link: lkml.kernel.org/r/20171218035751.20661-1-wens@csie.org Signed-off-by: Greg Kroah-Hartman --- drivers/clk/sunxi/clk-sun9i-mmc.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/drivers/clk/sunxi/clk-sun9i-mmc.c b/drivers/clk/sunxi/clk-sun9i-mmc.c index 6041bdba2e97..f69f9e8c6f38 100644 --- a/drivers/clk/sunxi/clk-sun9i-mmc.c +++ b/drivers/clk/sunxi/clk-sun9i-mmc.c @@ -16,6 +16,7 @@ #include #include +#include #include #include #include @@ -83,9 +84,20 @@ static int sun9i_mmc_reset_deassert(struct reset_controller_dev *rcdev, return 0; } +static int sun9i_mmc_reset_reset(struct reset_controller_dev *rcdev, + unsigned long id) +{ + sun9i_mmc_reset_assert(rcdev, id); + udelay(10); + sun9i_mmc_reset_deassert(rcdev, id); + + return 0; +} + static const struct reset_control_ops sun9i_mmc_reset_ops = { .assert = sun9i_mmc_reset_assert, .deassert = sun9i_mmc_reset_deassert, + .reset = sun9i_mmc_reset_reset, }; static int sun9i_a80_mmc_config_clk_probe(struct platform_device *pdev) From a971d10f67df27a16cce5084092e151dba3e037d Mon Sep 17 00:00:00 2001 From: Ravi Bangoria Date: Tue, 12 Dec 2017 17:59:15 +0530 Subject: [PATCH 0795/1013] powerpc/perf: Dereference BHRB entries safely commit f41d84dddc66b164ac16acf3f584c276146f1c48 upstream. It's theoretically possible that branch instructions recorded in BHRB (Branch History Rolling Buffer) entries have already been unmapped before they are processed by the kernel. Hence, trying to dereference such memory location will result in a crash. eg: Unable to handle kernel paging request for data at address 0xd000000019c41764 Faulting instruction address: 0xc000000000084a14 NIP [c000000000084a14] branch_target+0x4/0x70 LR [c0000000000eb828] record_and_restart+0x568/0x5c0 Call Trace: [c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable) [c0000000000ec378] perf_event_interrupt+0x298/0x460 [c000000000027964] performance_monitor_exception+0x54/0x70 [c000000000009ba4] performance_monitor_common+0x114/0x120 Fix it by deferefencing the addresses safely. Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB") Suggested-by: Naveen N. Rao Signed-off-by: Ravi Bangoria Reviewed-by: Naveen N. Rao [mpe: Use probe_kernel_read() which is clearer, tweak change log] Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/perf/core-book3s.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c index 9e3da168d54c..b4209a68b85d 100644 --- a/arch/powerpc/perf/core-book3s.c +++ b/arch/powerpc/perf/core-book3s.c @@ -410,8 +410,12 @@ static __u64 power_pmu_bhrb_to(u64 addr) int ret; __u64 target; - if (is_kernel_addr(addr)) - return branch_target((unsigned int *)addr); + if (is_kernel_addr(addr)) { + if (probe_kernel_read(&instr, (void *)addr, sizeof(instr))) + return 0; + + return branch_target(&instr); + } /* Userspace: need copy instruction here then translate it */ pagefault_disable(); From 6209cb514d977e323a7dad1b0ee7e83fc906215d Mon Sep 17 00:00:00 2001 From: Chris Wilson Date: Mon, 4 Dec 2017 13:25:13 +0000 Subject: [PATCH 0796/1013] drm/i915: Flush pending GTT writes before unbinding commit 2797c4a11f373b2545c2398ccb02e362ee66a142 upstream. From the shrinker paths, we want to relinquish the GPU and GGTT access to the object, releasing the backing storage back to the system for swapout. As a part of that process we would unpin the pages, marking them for access by the CPU (for the swapout/swapin). However, if that process was interrupted after unbind the vma, we missed a flush of the inflight GGTT writes before we made that GTT space available again for reuse, with the prospect that we would redirect them to another page. The bug dates back to the introduction of multiple GGTT vma, but the code itself dates to commit 02bef8f98d26 ("drm/i915: Unbind closed vma for i915_gem_object_unbind()"). Fixes: 02bef8f98d26 ("drm/i915: Unbind closed vma for i915_gem_object_unbind()") Fixes: c5ad54cf7dd8 ("drm/i915: Use partial view in mmap fault handler") Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Reviewed-by: Joonas Lahtinen Link: https://patchwork.freedesktop.org/patch/msgid/20171204132513.7303-1-chris@chris-wilson.co.uk (cherry picked from commit 5888fc9eac3c2ff96e76aeeb865fdb46ab2d711e) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/i915_gem.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index dc1faa49687d..3b2c0538e48d 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -325,17 +325,10 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj) * must wait for all rendering to complete to the object (as unbinding * must anyway), and retire the requests. */ - ret = i915_gem_object_wait(obj, - I915_WAIT_INTERRUPTIBLE | - I915_WAIT_LOCKED | - I915_WAIT_ALL, - MAX_SCHEDULE_TIMEOUT, - NULL); + ret = i915_gem_object_set_to_cpu_domain(obj, false); if (ret) return ret; - i915_gem_retire_requests(to_i915(obj->base.dev)); - while ((vma = list_first_entry_or_null(&obj->vma_list, struct i915_vma, obj_link))) { From e7681f90a45a8f0eb6da8c0dc68a8334e3ed2251 Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Thu, 7 Dec 2017 16:58:50 +0100 Subject: [PATCH 0797/1013] drm/sun4i: Fix error path handling commit 92411f6d7f1afcc95e54295d40e96a75385212ec upstream. The commit 4c7f16d14a33 ("drm/sun4i: Fix TCON clock and regmap initialization sequence") moved a bunch of logic around, but forgot to update the gotos after the introduction of the err_free_dotclock label. It means that if we fail later that the one introduced in that commit, we'll just to the old label which isn't free the clock we created. This will result in a breakage as soon as someone tries to do something with that clock, since its resources will have been long reclaimed. Fixes: 4c7f16d14a33 ("drm/sun4i: Fix TCON clock and regmap initialization sequence") Reviewed-by: Chen-Yu Tsai Signed-off-by: Maxime Ripard Link: https://patchwork.freedesktop.org/patch/msgid/f83c1cebc731f0b4251f5ddd7b38c718cd79bb0b.1512662253.git-series.maxime.ripard@free-electrons.com Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/sun4i/sun4i_tcon.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/sun4i/sun4i_tcon.c b/drivers/gpu/drm/sun4i/sun4i_tcon.c index d9791292553e..7b909d814d38 100644 --- a/drivers/gpu/drm/sun4i/sun4i_tcon.c +++ b/drivers/gpu/drm/sun4i/sun4i_tcon.c @@ -567,12 +567,12 @@ static int sun4i_tcon_bind(struct device *dev, struct device *master, if (IS_ERR(tcon->crtc)) { dev_err(dev, "Couldn't create our CRTC\n"); ret = PTR_ERR(tcon->crtc); - goto err_free_clocks; + goto err_free_dotclock; } ret = sun4i_rgb_init(drm, tcon); if (ret < 0) - goto err_free_clocks; + goto err_free_dotclock; list_add_tail(&tcon->list, &drv->tcon_list); From 6d80b15a226e59ebbc3ca8c14b65885bc8ffb2b9 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Mon, 4 Dec 2017 14:07:43 -0800 Subject: [PATCH 0798/1013] libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment commit 41fce90f26333c4fa82e8e43b9ace86c4e8a0120 upstream. The following namespace configuration attempt: # ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f libndctl: ndctl_dax_enable: dax0.1: failed to enable Error: namespace0.0: failed to enable failed to reconfigure namespace: No such device or address ...fails when the backing memory range is not physically aligned to 1G: # cat /proc/iomem | grep Persistent 210000000-30fffffff : Persistent Memory (legacy) In the above example the 4G persistent memory range starts and ends on a 256MB boundary. We handle this case correctly when needing to handle cases that violate section alignment (128MB) collisions against "System RAM", and we simply need to extend that padding/truncation for the 1GB alignment use case. Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...") Reported-and-tested-by: Jane Chu Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- drivers/nvdimm/pfn_devs.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 65cc171c721d..5ba9ab7af736 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -582,6 +582,12 @@ static struct vmem_altmap *__nvdimm_setup_pfn(struct nd_pfn *nd_pfn, return altmap; } +static u64 phys_pmem_align_down(struct nd_pfn *nd_pfn, u64 phys) +{ + return min_t(u64, PHYS_SECTION_ALIGN_DOWN(phys), + ALIGN_DOWN(phys, nd_pfn->align)); +} + static int nd_pfn_init(struct nd_pfn *nd_pfn) { u32 dax_label_reserve = is_nd_dax(&nd_pfn->dev) ? SZ_128K : 0; @@ -637,13 +643,16 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn) start = nsio->res.start; size = PHYS_SECTION_ALIGN_UP(start + size) - start; if (region_intersects(start, size, IORESOURCE_SYSTEM_RAM, - IORES_DESC_NONE) == REGION_MIXED) { + IORES_DESC_NONE) == REGION_MIXED + || !IS_ALIGNED(start + resource_size(&nsio->res), + nd_pfn->align)) { size = resource_size(&nsio->res); - end_trunc = start + size - PHYS_SECTION_ALIGN_DOWN(start + size); + end_trunc = start + size - phys_pmem_align_down(nd_pfn, + start + size); } if (start_pad + end_trunc) - dev_info(&nd_pfn->dev, "%s section collision, truncate %d bytes\n", + dev_info(&nd_pfn->dev, "%s alignment collision, truncate %d bytes\n", dev_name(&ndns->dev), start_pad + end_trunc); /* From 166f39bc340d0eb739d31ddd347df593d7b6ff97 Mon Sep 17 00:00:00 2001 From: Vishal Verma Date: Mon, 18 Dec 2017 09:28:39 -0700 Subject: [PATCH 0799/1013] libnvdimm, btt: Fix an incompatibility in the log layout commit 24e3a7fb60a9187e5df90e5fa655ffc94b9c4f77 upstream. Due to a spec misinterpretation, the Linux implementation of the BTT log area had different padding scheme from other implementations, such as UEFI and NVML. This fixes the padding scheme, and defaults to it for new BTT layouts. We attempt to detect the padding scheme in use when probing for an existing BTT. If we detect the older/incompatible scheme, we continue using it. Reported-by: Juston Li Cc: Dan Williams Fixes: 5212e11fde4d ("nd_btt: atomic sector updates") Signed-off-by: Vishal Verma Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- drivers/nvdimm/btt.c | 201 +++++++++++++++++++++++++++++++++++-------- drivers/nvdimm/btt.h | 45 +++++++++- 2 files changed, 211 insertions(+), 35 deletions(-) diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c index d5612bd1cc81..09428ebd315b 100644 --- a/drivers/nvdimm/btt.c +++ b/drivers/nvdimm/btt.c @@ -210,12 +210,12 @@ static int btt_map_read(struct arena_info *arena, u32 lba, u32 *mapping, return ret; } -static int btt_log_read_pair(struct arena_info *arena, u32 lane, - struct log_entry *ent) +static int btt_log_group_read(struct arena_info *arena, u32 lane, + struct log_group *log) { return arena_read_bytes(arena, - arena->logoff + (2 * lane * LOG_ENT_SIZE), ent, - 2 * LOG_ENT_SIZE, 0); + arena->logoff + (lane * LOG_GRP_SIZE), log, + LOG_GRP_SIZE, 0); } static struct dentry *debugfs_root; @@ -255,6 +255,8 @@ static void arena_debugfs_init(struct arena_info *a, struct dentry *parent, debugfs_create_x64("logoff", S_IRUGO, d, &a->logoff); debugfs_create_x64("info2off", S_IRUGO, d, &a->info2off); debugfs_create_x32("flags", S_IRUGO, d, &a->flags); + debugfs_create_u32("log_index_0", S_IRUGO, d, &a->log_index[0]); + debugfs_create_u32("log_index_1", S_IRUGO, d, &a->log_index[1]); } static void btt_debugfs_init(struct btt *btt) @@ -273,6 +275,11 @@ static void btt_debugfs_init(struct btt *btt) } } +static u32 log_seq(struct log_group *log, int log_idx) +{ + return le32_to_cpu(log->ent[log_idx].seq); +} + /* * This function accepts two log entries, and uses the * sequence number to find the 'older' entry. @@ -282,8 +289,10 @@ static void btt_debugfs_init(struct btt *btt) * * TODO The logic feels a bit kludge-y. make it better.. */ -static int btt_log_get_old(struct log_entry *ent) +static int btt_log_get_old(struct arena_info *a, struct log_group *log) { + int idx0 = a->log_index[0]; + int idx1 = a->log_index[1]; int old; /* @@ -291,23 +300,23 @@ static int btt_log_get_old(struct log_entry *ent) * the next time, the following logic works out to put this * (next) entry into [1] */ - if (ent[0].seq == 0) { - ent[0].seq = cpu_to_le32(1); + if (log_seq(log, idx0) == 0) { + log->ent[idx0].seq = cpu_to_le32(1); return 0; } - if (ent[0].seq == ent[1].seq) + if (log_seq(log, idx0) == log_seq(log, idx1)) return -EINVAL; - if (le32_to_cpu(ent[0].seq) + le32_to_cpu(ent[1].seq) > 5) + if (log_seq(log, idx0) + log_seq(log, idx1) > 5) return -EINVAL; - if (le32_to_cpu(ent[0].seq) < le32_to_cpu(ent[1].seq)) { - if (le32_to_cpu(ent[1].seq) - le32_to_cpu(ent[0].seq) == 1) + if (log_seq(log, idx0) < log_seq(log, idx1)) { + if ((log_seq(log, idx1) - log_seq(log, idx0)) == 1) old = 0; else old = 1; } else { - if (le32_to_cpu(ent[0].seq) - le32_to_cpu(ent[1].seq) == 1) + if ((log_seq(log, idx0) - log_seq(log, idx1)) == 1) old = 1; else old = 0; @@ -327,17 +336,18 @@ static int btt_log_read(struct arena_info *arena, u32 lane, { int ret; int old_ent, ret_ent; - struct log_entry log[2]; + struct log_group log; - ret = btt_log_read_pair(arena, lane, log); + ret = btt_log_group_read(arena, lane, &log); if (ret) return -EIO; - old_ent = btt_log_get_old(log); + old_ent = btt_log_get_old(arena, &log); if (old_ent < 0 || old_ent > 1) { dev_err(to_dev(arena), "log corruption (%d): lane %d seq [%d, %d]\n", - old_ent, lane, log[0].seq, log[1].seq); + old_ent, lane, log.ent[arena->log_index[0]].seq, + log.ent[arena->log_index[1]].seq); /* TODO set error state? */ return -EIO; } @@ -345,7 +355,7 @@ static int btt_log_read(struct arena_info *arena, u32 lane, ret_ent = (old_flag ? old_ent : (1 - old_ent)); if (ent != NULL) - memcpy(ent, &log[ret_ent], LOG_ENT_SIZE); + memcpy(ent, &log.ent[arena->log_index[ret_ent]], LOG_ENT_SIZE); return ret_ent; } @@ -359,17 +369,13 @@ static int __btt_log_write(struct arena_info *arena, u32 lane, u32 sub, struct log_entry *ent, unsigned long flags) { int ret; - /* - * Ignore the padding in log_entry for calculating log_half. - * The entry is 'committed' when we write the sequence number, - * and we want to ensure that that is the last thing written. - * We don't bother writing the padding as that would be extra - * media wear and write amplification - */ - unsigned int log_half = (LOG_ENT_SIZE - 2 * sizeof(u64)) / 2; - u64 ns_off = arena->logoff + (((2 * lane) + sub) * LOG_ENT_SIZE); + u32 group_slot = arena->log_index[sub]; + unsigned int log_half = LOG_ENT_SIZE / 2; void *src = ent; + u64 ns_off; + ns_off = arena->logoff + (lane * LOG_GRP_SIZE) + + (group_slot * LOG_ENT_SIZE); /* split the 16B write into atomic, durable halves */ ret = arena_write_bytes(arena, ns_off, src, log_half, flags); if (ret) @@ -452,7 +458,7 @@ static int btt_log_init(struct arena_info *arena) { size_t logsize = arena->info2off - arena->logoff; size_t chunk_size = SZ_4K, offset = 0; - struct log_entry log; + struct log_entry ent; void *zerobuf; int ret; u32 i; @@ -484,11 +490,11 @@ static int btt_log_init(struct arena_info *arena) } for (i = 0; i < arena->nfree; i++) { - log.lba = cpu_to_le32(i); - log.old_map = cpu_to_le32(arena->external_nlba + i); - log.new_map = cpu_to_le32(arena->external_nlba + i); - log.seq = cpu_to_le32(LOG_SEQ_INIT); - ret = __btt_log_write(arena, i, 0, &log, 0); + ent.lba = cpu_to_le32(i); + ent.old_map = cpu_to_le32(arena->external_nlba + i); + ent.new_map = cpu_to_le32(arena->external_nlba + i); + ent.seq = cpu_to_le32(LOG_SEQ_INIT); + ret = __btt_log_write(arena, i, 0, &ent, 0); if (ret) goto free; } @@ -593,6 +599,123 @@ static int btt_freelist_init(struct arena_info *arena) return 0; } +static bool ent_is_padding(struct log_entry *ent) +{ + return (ent->lba == 0) && (ent->old_map == 0) && (ent->new_map == 0) + && (ent->seq == 0); +} + +/* + * Detecting valid log indices: We read a log group (see the comments in btt.h + * for a description of a 'log_group' and its 'slots'), and iterate over its + * four slots. We expect that a padding slot will be all-zeroes, and use this + * to detect a padding slot vs. an actual entry. + * + * If a log_group is in the initial state, i.e. hasn't been used since the + * creation of this BTT layout, it will have three of the four slots with + * zeroes. We skip over these log_groups for the detection of log_index. If + * all log_groups are in the initial state (i.e. the BTT has never been + * written to), it is safe to assume the 'new format' of log entries in slots + * (0, 1). + */ +static int log_set_indices(struct arena_info *arena) +{ + bool idx_set = false, initial_state = true; + int ret, log_index[2] = {-1, -1}; + u32 i, j, next_idx = 0; + struct log_group log; + u32 pad_count = 0; + + for (i = 0; i < arena->nfree; i++) { + ret = btt_log_group_read(arena, i, &log); + if (ret < 0) + return ret; + + for (j = 0; j < 4; j++) { + if (!idx_set) { + if (ent_is_padding(&log.ent[j])) { + pad_count++; + continue; + } else { + /* Skip if index has been recorded */ + if ((next_idx == 1) && + (j == log_index[0])) + continue; + /* valid entry, record index */ + log_index[next_idx] = j; + next_idx++; + } + if (next_idx == 2) { + /* two valid entries found */ + idx_set = true; + } else if (next_idx > 2) { + /* too many valid indices */ + return -ENXIO; + } + } else { + /* + * once the indices have been set, just verify + * that all subsequent log groups are either in + * their initial state or follow the same + * indices. + */ + if (j == log_index[0]) { + /* entry must be 'valid' */ + if (ent_is_padding(&log.ent[j])) + return -ENXIO; + } else if (j == log_index[1]) { + ; + /* + * log_index[1] can be padding if the + * lane never got used and it is still + * in the initial state (three 'padding' + * entries) + */ + } else { + /* entry must be invalid (padding) */ + if (!ent_is_padding(&log.ent[j])) + return -ENXIO; + } + } + } + /* + * If any of the log_groups have more than one valid, + * non-padding entry, then the we are no longer in the + * initial_state + */ + if (pad_count < 3) + initial_state = false; + pad_count = 0; + } + + if (!initial_state && !idx_set) + return -ENXIO; + + /* + * If all the entries in the log were in the initial state, + * assume new padding scheme + */ + if (initial_state) + log_index[1] = 1; + + /* + * Only allow the known permutations of log/padding indices, + * i.e. (0, 1), and (0, 2) + */ + if ((log_index[0] == 0) && ((log_index[1] == 1) || (log_index[1] == 2))) + ; /* known index possibilities */ + else { + dev_err(to_dev(arena), "Found an unknown padding scheme\n"); + return -ENXIO; + } + + arena->log_index[0] = log_index[0]; + arena->log_index[1] = log_index[1]; + dev_dbg(to_dev(arena), "log_index_0 = %d\n", log_index[0]); + dev_dbg(to_dev(arena), "log_index_1 = %d\n", log_index[1]); + return 0; +} + static int btt_rtt_init(struct arena_info *arena) { arena->rtt = kcalloc(arena->nfree, sizeof(u32), GFP_KERNEL); @@ -649,8 +772,7 @@ static struct arena_info *alloc_arena(struct btt *btt, size_t size, available -= 2 * BTT_PG_SIZE; /* The log takes a fixed amount of space based on nfree */ - logsize = roundup(2 * arena->nfree * sizeof(struct log_entry), - BTT_PG_SIZE); + logsize = roundup(arena->nfree * LOG_GRP_SIZE, BTT_PG_SIZE); available -= logsize; /* Calculate optimal split between map and data area */ @@ -667,6 +789,10 @@ static struct arena_info *alloc_arena(struct btt *btt, size_t size, arena->mapoff = arena->dataoff + datasize; arena->logoff = arena->mapoff + mapsize; arena->info2off = arena->logoff + logsize; + + /* Default log indices are (0,1) */ + arena->log_index[0] = 0; + arena->log_index[1] = 1; return arena; } @@ -757,6 +883,13 @@ static int discover_arenas(struct btt *btt) arena->external_lba_start = cur_nlba; parse_arena_meta(arena, super, cur_off); + ret = log_set_indices(arena); + if (ret) { + dev_err(to_dev(arena), + "Unable to deduce log/padding indices\n"); + goto out; + } + mutex_init(&arena->err_lock); ret = btt_freelist_init(arena); if (ret) diff --git a/drivers/nvdimm/btt.h b/drivers/nvdimm/btt.h index 578c2057524d..2609683c4167 100644 --- a/drivers/nvdimm/btt.h +++ b/drivers/nvdimm/btt.h @@ -27,6 +27,7 @@ #define MAP_ERR_MASK (1 << MAP_ERR_SHIFT) #define MAP_LBA_MASK (~((1 << MAP_TRIM_SHIFT) | (1 << MAP_ERR_SHIFT))) #define MAP_ENT_NORMAL 0xC0000000 +#define LOG_GRP_SIZE sizeof(struct log_group) #define LOG_ENT_SIZE sizeof(struct log_entry) #define ARENA_MIN_SIZE (1UL << 24) /* 16 MB */ #define ARENA_MAX_SIZE (1ULL << 39) /* 512 GB */ @@ -50,12 +51,52 @@ enum btt_init_state { INIT_READY }; +/* + * A log group represents one log 'lane', and consists of four log entries. + * Two of the four entries are valid entries, and the remaining two are + * padding. Due to an old bug in the padding location, we need to perform a + * test to determine the padding scheme being used, and use that scheme + * thereafter. + * + * In kernels prior to 4.15, 'log group' would have actual log entries at + * indices (0, 2) and padding at indices (1, 3), where as the correct/updated + * format has log entries at indices (0, 1) and padding at indices (2, 3). + * + * Old (pre 4.15) format: + * +-----------------+-----------------+ + * | ent[0] | ent[1] | + * | 16B | 16B | + * | lba/old/new/seq | pad | + * +-----------------------------------+ + * | ent[2] | ent[3] | + * | 16B | 16B | + * | lba/old/new/seq | pad | + * +-----------------+-----------------+ + * + * New format: + * +-----------------+-----------------+ + * | ent[0] | ent[1] | + * | 16B | 16B | + * | lba/old/new/seq | lba/old/new/seq | + * +-----------------------------------+ + * | ent[2] | ent[3] | + * | 16B | 16B | + * | pad | pad | + * +-----------------+-----------------+ + * + * We detect during start-up which format is in use, and set + * arena->log_index[(0, 1)] with the detected format. + */ + struct log_entry { __le32 lba; __le32 old_map; __le32 new_map; __le32 seq; - __le64 padding[2]; +}; + +struct log_group { + struct log_entry ent[4]; }; struct btt_sb { @@ -125,6 +166,7 @@ struct aligned_lock { * @list: List head for list of arenas * @debugfs_dir: Debugfs dentry * @flags: Arena flags - may signify error states. + * @log_index: Indices of the valid log entries in a log_group * * arena_info is a per-arena handle. Once an arena is narrowed down for an * IO, this struct is passed around for the duration of the IO. @@ -157,6 +199,7 @@ struct arena_info { /* Arena flags */ u32 flags; struct mutex err_lock; + int log_index[2]; }; /** From 01b1a29e32c13c69c0fd8a70e6cd66ac91316862 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 19 Dec 2017 15:07:10 -0800 Subject: [PATCH 0800/1013] libnvdimm, pfn: fix start_pad handling for aligned namespaces commit 19deaa217bc04e83b59b5e8c8229eb0e53ad9efc upstream. The alignment checks at pfn driver startup fail to properly account for the 'start_pad' in the case where the namespace is misaligned relative to its internal alignment. This is typically triggered in 1G aligned namespace, but could theoretically trigger with small namespace alignments. When this triggers the kernel reports messages of the form: dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000 Fixes: 1ee6667cd8d1 ("libnvdimm, pfn, dax: fix initialization vs autodetect...") Reported-by: Jane Chu Signed-off-by: Dan Williams Signed-off-by: Greg Kroah-Hartman --- drivers/nvdimm/pfn_devs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c index 5ba9ab7af736..2adada1a5855 100644 --- a/drivers/nvdimm/pfn_devs.c +++ b/drivers/nvdimm/pfn_devs.c @@ -364,9 +364,9 @@ struct device *nd_pfn_create(struct nd_region *nd_region) int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) { u64 checksum, offset; - unsigned long align; enum nd_pfn_mode mode; struct nd_namespace_io *nsio; + unsigned long align, start_pad; struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb; struct nd_namespace_common *ndns = nd_pfn->ndns; const u8 *parent_uuid = nd_dev_to_uuid(&ndns->dev); @@ -410,6 +410,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) align = le32_to_cpu(pfn_sb->align); offset = le64_to_cpu(pfn_sb->dataoff); + start_pad = le32_to_cpu(pfn_sb->start_pad); if (align == 0) align = 1UL << ilog2(offset); mode = le32_to_cpu(pfn_sb->mode); @@ -468,7 +469,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char *sig) return -EBUSY; } - if ((align && !IS_ALIGNED(offset, align)) + if ((align && !IS_ALIGNED(nsio->res.start + offset + start_pad, align)) || !IS_ALIGNED(offset, PAGE_SIZE)) { dev_err(&nd_pfn->dev, "bad offset: %#llx dax disabled align: %#lx\n", From b7c0181c47c4369cfccd97bca7e9e211026aa94c Mon Sep 17 00:00:00 2001 From: Yelena Krivosheev Date: Tue, 19 Dec 2017 17:59:45 +0100 Subject: [PATCH 0801/1013] net: mvneta: clear interface link status on port disable commit 4423c18e466afdfb02a36ee8b9f901d144b3c607 upstream. When port connect to PHY in polling mode (with poll interval 1 sec), port and phy link status must be synchronize in order don't loss link change event. [gregory.clement@free-electrons.com: add fixes tag] Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Signed-off-by: Yelena Krivosheev Tested-by: Dmitri Epshtein Signed-off-by: Gregory CLEMENT Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/marvell/mvneta.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index bc93b69cfd1e..16b2bfb2cf51 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -1214,6 +1214,10 @@ static void mvneta_port_disable(struct mvneta_port *pp) val &= ~MVNETA_GMAC0_PORT_ENABLE; mvreg_write(pp, MVNETA_GMAC_CTRL_0, val); + pp->link = 0; + pp->duplex = -1; + pp->speed = 0; + udelay(200); } From c00802d53ddf3724602c1da0344c6fcaf9001358 Mon Sep 17 00:00:00 2001 From: Yelena Krivosheev Date: Tue, 19 Dec 2017 17:59:46 +0100 Subject: [PATCH 0802/1013] net: mvneta: use proper rxq_number in loop on rx queues commit ca5902a6547f662419689ca28b3c29a772446caa upstream. When adding the RX queue association with each CPU, a typo was made in the mvneta_cleanup_rxqs() function. This patch fixes it. [gregory.clement@free-electrons.com: add commit log and fixes tag] Fixes: 2dcf75e2793c ("net: mvneta: Associate RX queues with each CPU") Signed-off-by: Yelena Krivosheev Tested-by: Dmitri Epshtein Signed-off-by: Gregory CLEMENT Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 16b2bfb2cf51..1e0835655c93 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -3015,7 +3015,7 @@ static void mvneta_cleanup_rxqs(struct mvneta_port *pp) { int queue; - for (queue = 0; queue < txq_number; queue++) + for (queue = 0; queue < rxq_number; queue++) mvneta_rxq_deinit(pp, &pp->rxqs[queue]); } From 646809937c02396f06a38a38ac686eff1b5e85fb Mon Sep 17 00:00:00 2001 From: Yelena Krivosheev Date: Tue, 19 Dec 2017 17:59:47 +0100 Subject: [PATCH 0803/1013] net: mvneta: eliminate wrong call to handle rx descriptor error commit 2eecb2e04abb62ef8ea7b43e1a46bdb5b99d1bf8 upstream. There are few reasons in mvneta_rx_swbm() function when received packet is dropped. mvneta_rx_error() should be called only if error bit [16] is set in rx descriptor. [gregory.clement@free-electrons.com: add fixes tag] Fixes: dc35a10f68d3 ("net: mvneta: bm: add support for hardware buffer management") Signed-off-by: Yelena Krivosheev Tested-by: Dmitri Epshtein Signed-off-by: Gregory CLEMENT Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/marvell/mvneta.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c index 1e0835655c93..a539263cd79c 100644 --- a/drivers/net/ethernet/marvell/mvneta.c +++ b/drivers/net/ethernet/marvell/mvneta.c @@ -1962,9 +1962,9 @@ static int mvneta_rx_swbm(struct mvneta_port *pp, int rx_todo, if (!mvneta_rxq_desc_is_first_last(rx_status) || (rx_status & MVNETA_RXD_ERR_SUMMARY)) { + mvneta_rx_error(pp, rx_desc); err_drop_frame: dev->stats.rx_errors++; - mvneta_rx_error(pp, rx_desc); /* leave the descriptor untouched */ continue; } From af1eddcc176e36104f50abf32b14cb979385d9f6 Mon Sep 17 00:00:00 2001 From: John Einar Reitan Date: Sun, 24 Dec 2017 00:03:44 +0100 Subject: [PATCH 0804/1013] Revert "ipmi_si: fix memory leak on new_smi" This reverts commit c97e41076a298dbc4e910c33048e553658388eed, which incorrectly was taken from upstream c0a32fe13cd323ca9420500b16fd69589c9ba91e. The referenced memory leak doesn't exist on the 4.14 stable branch as the new logic of doing the kzalloc hasn't moved to this function. By adding this kfree we actually end up doing double kfree as all callers of smi_add does a kfree on error. Sample with SLAB_FREELIST_HARDENED=y: ipmi_si: Adding ACPI-specified kcs state machine IPMI System Interface driver. ipmi_si: probing via SPMI ipmi_si: SPMI: io 0xca2 regsize 1 spacing 1 irq 0 (NULL device *): SPMI-specified kcs state machine: duplicate ------------[ cut here ]------------ kernel BUG at mm/slub.c:295! invalid opcode: 0000 [#1] SMP Modules linked in: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.8-gentoo-r1 #5 Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.2 02/20/2015 task: ffff88080c208000 task.stack: ffffc90000020000 RIP: 0010:kfree+0xf5/0x157 RSP: 0000:ffffc90000023e58 EFLAGS: 00010246 RAX: ffff88080b2e6200 RBX: ffff88080b2e6200 RCX: ffff88080b2e6200 RDX: 000000000000008e RSI: ffff88082fc1cd60 RDI: ffff88080c003080 RBP: ffffc90000002808 R08: 000000000001cd60 R09: ffffffff814da10e R10: ffffea00202cb980 R11: 000000000000005c R12: ffffffff814da10e R13: 00000000ffffffed R14: ffffffff82317bd0 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff88082fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000002e09001 CR4: 00000000001606f0 Call Trace: init_ipmi_si+0x493/0x5c7 ? cleanup_ipmi_si+0x84/0x84 ? set_debug_rodata+0xc/0xc ? kthread+0x4c/0x11c do_one_initcall+0x94/0x13d ? set_debug_rodata+0xc/0xc kernel_init_freeable+0x112/0x18e ? rest_init+0xa0/0xa0 kernel_init+0x5/0xe1 ret_from_fork+0x22/0x30 Code: 24 18 49 8b 7a 30 48 8b 37 65 48 8b 56 08 65 48 03 35 3a 29 e2 7e 4c 3b 56 10 75 39 48 8b 0e 48 63 47 20 48 01 d8 48 39 cb 75 02 <0f> 0b 49 89 c0 4c 33 87 40 01 00 00 4c 31 c1 48 89 08 48 8d 4a ---[ end trace 4ac2e2c100842676 ]--- Signed-off-by: John Einar Reitan Signed-off-by: Greg Kroah-Hartman --- drivers/char/ipmi/ipmi_si_intf.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c index e1cbb78c6806..c04aa11f0e21 100644 --- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -3469,7 +3469,6 @@ static int add_smi(struct smi_info *new_smi) ipmi_addr_src_to_str(new_smi->addr_source), si_to_str[new_smi->si_type]); rv = -EBUSY; - kfree(new_smi); goto out_err; } } From b8ce8232fcc37fe7a97db79ea0a5f32098c25e72 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 29 Dec 2017 17:53:50 +0100 Subject: [PATCH 0805/1013] Linux 4.14.10 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index ed2132c6d286..9edfb78836a9 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 9 +SUBLEVEL = 10 EXTRAVERSION = NAME = Petit Gorille From 234bc12669a362773e84834ace52f1af7510196b Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 22 Dec 2017 20:38:57 -0500 Subject: [PATCH 0806/1013] tracing: Remove extra zeroing out of the ring buffer page commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream. The ring_buffer_read_page() takes care of zeroing out any extra data in the page that it returns. There's no need to zero it out again from the consumer. It was removed from one consumer of this function, but read_buffers_splice_read() did not remove it, and worse, it contained a nasty bug because of it. Fixes: 2711ca237a084 ("ring-buffer: Move zeroing out excess in page to ring buffer code") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace.c | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 80de14973b42..75ae95356a71 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6769,7 +6769,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, .spd_release = buffer_spd_release, }; struct buffer_ref *ref; - int entries, size, i; + int entries, i; ssize_t ret = 0; #ifdef CONFIG_TRACER_MAX_TRACE @@ -6823,14 +6823,6 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos, break; } - /* - * zero out any left over data, this is going to - * user land. - */ - size = ring_buffer_page_len(ref->page); - if (size < PAGE_SIZE) - memset(ref->page + size, 0, PAGE_SIZE - size); - page = virt_to_page(ref->page); spd.pages[i] = page; From 21a9c7346ef696161dacbbd9f47dabb0f062c4c8 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Tue, 26 Dec 2017 20:07:34 -0500 Subject: [PATCH 0807/1013] tracing: Fix possible double free on failure of allocating trace buffer commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream. Jing Xia and Chunyan Zhang reported that on failing to allocate part of the tracing buffer, memory is freed, but the pointers that point to them are not initialized back to NULL, and later paths may try to free the freed memory again. Jing and Chunyan fixed one of the locations that does this, but missed a spot. Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Reported-by: Jing Xia Reported-by: Chunyan Zhang Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace.c | 1 + 1 file changed, 1 insertion(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 75ae95356a71..68d87d2b3822 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7580,6 +7580,7 @@ allocate_trace_buffer(struct trace_array *tr, struct trace_buffer *buf, int size buf->data = alloc_percpu(struct trace_array_cpu); if (!buf->data) { ring_buffer_free(buf->buffer); + buf->buffer = NULL; return -ENOMEM; } From 98669825616890a96c9fc67c5f983484578e4dd9 Mon Sep 17 00:00:00 2001 From: Jing Xia Date: Tue, 26 Dec 2017 15:12:53 +0800 Subject: [PATCH 0808/1013] tracing: Fix crash when it fails to alloc ring buffer commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream. Double free of the ring buffer happens when it fails to alloc new ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured. The root cause is that the pointer is not set to NULL after the buffer is freed in allocate_trace_buffers(), and the freeing of the ring buffer is invoked again later if the pointer is not equal to Null, as: instance_mkdir() |-allocate_trace_buffers() |-allocate_trace_buffer(tr, &tr->trace_buffer...) |-allocate_trace_buffer(tr, &tr->max_buffer...) // allocate fail(-ENOMEM),first free // and the buffer pointer is not set to null |-ring_buffer_free(tr->trace_buffer.buffer) // out_free_tr |-free_trace_buffers() |-free_trace_buffer(&tr->trace_buffer); //if trace_buffer is not null, free again |-ring_buffer_free(buf->buffer) |-rb_free_cpu_buffer(buffer->buffers[cpu]) // ring_buffer_per_cpu is null, and // crash in ring_buffer_per_cpu->pages Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Signed-off-by: Jing Xia Signed-off-by: Chunyan Zhang Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/trace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 68d87d2b3822..76bcc80b893e 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7604,7 +7604,9 @@ static int allocate_trace_buffers(struct trace_array *tr, int size) allocate_snapshot ? size : 1); if (WARN_ON(ret)) { ring_buffer_free(tr->trace_buffer.buffer); + tr->trace_buffer.buffer = NULL; free_percpu(tr->trace_buffer.data); + tr->trace_buffer.data = NULL; return -ENOMEM; } tr->allocated_snapshot = allocate_snapshot; From 72a2beddcd3240047552de69ce45a28029c7e56c Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:33 +0100 Subject: [PATCH 0809/1013] x86/cpufeatures: Add X86_BUG_CPU_INSECURE commit a89f040fa34ec9cd682aed98b8f04e3c47d998bd upstream. Many x86 CPUs leak information to user space due to missing isolation of user space and kernel space page tables. There are many well documented ways to exploit that. The upcoming software migitation of isolating the user and kernel space page tables needs a misfeature flag so code can be made runtime conditional. Add the BUG bits which indicates that the CPU is affected and add a feature bit which indicates that the software migitation is enabled. Assume for now that _ALL_ x86 CPUs are affected by this. Exceptions can be made later. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 3 ++- arch/x86/include/asm/disabled-features.h | 8 +++++++- arch/x86/kernel/cpu/common.c | 4 ++++ 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 800104c8a3ed..d8ec834ea884 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -201,7 +201,7 @@ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ #define X86_FEATURE_SME ( 7*32+10) /* AMD Secure Memory Encryption */ - +#define X86_FEATURE_PTI ( 7*32+11) /* Kernel Page Table Isolation enabled */ #define X86_FEATURE_INTEL_PPIN ( 7*32+14) /* Intel Processor Inventory Number */ #define X86_FEATURE_INTEL_PT ( 7*32+15) /* Intel Processor Trace */ #define X86_FEATURE_AVX512_4VNNIW ( 7*32+16) /* AVX-512 Neural Network Instructions */ @@ -340,5 +340,6 @@ #define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ +#define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index c10c9128f54e..e428e16dd822 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -44,6 +44,12 @@ # define DISABLE_LA57 (1<<(X86_FEATURE_LA57 & 31)) #endif +#ifdef CONFIG_PAGE_TABLE_ISOLATION +# define DISABLE_PTI 0 +#else +# define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) +#endif + /* * Make sure to add features to the correct mask */ @@ -54,7 +60,7 @@ #define DISABLED_MASK4 (DISABLE_PCID) #define DISABLED_MASK5 0 #define DISABLED_MASK6 0 -#define DISABLED_MASK7 0 +#define DISABLED_MASK7 (DISABLE_PTI) #define DISABLED_MASK8 0 #define DISABLED_MASK9 (DISABLE_MPX) #define DISABLED_MASK10 0 diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 8ddcfa4d4165..a9210f9b7cf8 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -898,6 +898,10 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) } setup_force_cpu_cap(X86_FEATURE_ALWAYS); + + /* Assume for now that ALL x86 CPUs are insecure */ + setup_force_cpu_bug(X86_BUG_CPU_INSECURE); + fpu__init_system(c); #ifdef CONFIG_X86_32 From acfee9b8e27e7b5d69276e8b804fed7ff5071c10 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:34 +0100 Subject: [PATCH 0810/1013] x86/mm/pti: Disable global pages if PAGE_TABLE_ISOLATION=y commit c313ec66317d421fb5768d78c56abed2dc862264 upstream. Global pages stay in the TLB across context switches. Since all contexts share the same kernel mapping, these mappings are marked as global pages so kernel entries in the TLB are not flushed out on a context switch. But, even having these entries in the TLB opens up something that an attacker can use, such as the double-page-fault attack: http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf That means that even when PAGE_TABLE_ISOLATION switches page tables on return to user space the global pages would stay in the TLB cache. Disable global pages so that kernel TLB entries can be flushed before returning to user space. This way, all accesses to kernel addresses from userspace result in a TLB miss independent of the existence of a kernel mapping. Suppress global pages via the __supported_pte_mask. The user space mappings set PAGE_GLOBAL for the minimal kernel mappings which are required for entry/exit. These mappings are set up manually so the filtering does not take place. [ The __supported_pte_mask simplification was written by Thomas Gleixner. ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index a22c2b95e513..020223420308 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -161,6 +161,12 @@ struct map_range { static int page_size_mask; +static void enable_global_pages(void) +{ + if (!static_cpu_has(X86_FEATURE_PTI)) + __supported_pte_mask |= _PAGE_GLOBAL; +} + static void __init probe_page_size_mask(void) { /* @@ -179,11 +185,11 @@ static void __init probe_page_size_mask(void) cr4_set_bits_and_update_boot(X86_CR4_PSE); /* Enable PGE if available */ + __supported_pte_mask &= ~_PAGE_GLOBAL; if (boot_cpu_has(X86_FEATURE_PGE)) { cr4_set_bits_and_update_boot(X86_CR4_PGE); - __supported_pte_mask |= _PAGE_GLOBAL; - } else - __supported_pte_mask &= ~_PAGE_GLOBAL; + enable_global_pages(); + } /* Enable 1 GB linear kernel mappings if available: */ if (direct_gbpages && boot_cpu_has(X86_FEATURE_GBPAGES)) { From f3d2b767e912b11d146b9c9922bf28efeda0cdc7 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:35 +0100 Subject: [PATCH 0811/1013] x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching commit 8a09317b895f073977346779df52f67c1056d81d upstream. PAGE_TABLE_ISOLATION needs to switch to a different CR3 value when it enters the kernel and switch back when it exits. This essentially needs to be done before leaving assembly code. This is extra challenging because the switching context is tricky: the registers that can be clobbered can vary. It is also hard to store things on the stack because there is an established ABI (ptregs) or the stack is entirely unsafe to use. Establish a set of macros that allow changing to the user and kernel CR3 values. Interactions with SWAPGS: Previous versions of the PAGE_TABLE_ISOLATION code relied on having per-CPU scratch space to save/restore a register that can be used for the CR3 MOV. The %GS register is used to index into our per-CPU space, so SWAPGS *had* to be done before the CR3 switch. That scratch space is gone now, but the semantic that SWAPGS must be done before the CR3 MOV is retained. This is good to keep because it is not that hard to do and it allows to do things like add per-CPU debugging information. What this does in the NMI code is worth pointing out. NMIs can interrupt *any* context and they can also be nested with NMIs interrupting other NMIs. The comments below ".Lnmi_from_kernel" explain the format of the stack during this situation. Changing the format of this stack is hard. Instead of storing the old CR3 value on the stack, this depends on the *regular* register save/restore mechanism and then uses %r14 to keep CR3 during the NMI. It is callee-saved and will not be clobbered by the C NMI handlers that get called. [ PeterZ: ESPFIX optimization ] Based-on-code-from: Andy Lutomirski Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Reviewed-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/calling.h | 66 ++++++++++++++++++++++++++++++++ arch/x86/entry/entry_64.S | 45 +++++++++++++++++++--- arch/x86/entry/entry_64_compat.S | 24 +++++++++++- 3 files changed, 128 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 3fd8bc560fae..a9d17a7686ab 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -1,6 +1,8 @@ /* SPDX-License-Identifier: GPL-2.0 */ #include #include +#include +#include /* @@ -187,6 +189,70 @@ For 32-bit we have the following conventions - kernel is built with #endif .endm +#ifdef CONFIG_PAGE_TABLE_ISOLATION + +/* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two halves: */ +#define PTI_SWITCH_MASK (1< in kernel */ SWAPGS xorl %ebx, %ebx -1: ret + +1: + SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg=%rax save_reg=%r14 + + ret END(paranoid_entry) /* @@ -1266,6 +1287,7 @@ ENTRY(paranoid_exit) testl %ebx, %ebx /* swapgs needed? */ jnz .Lparanoid_exit_no_swapgs TRACE_IRQS_IRETQ + RESTORE_CR3 save_reg=%r14 SWAPGS_UNSAFE_STACK jmp .Lparanoid_exit_restore .Lparanoid_exit_no_swapgs: @@ -1293,6 +1315,8 @@ ENTRY(error_entry) * from user mode due to an IRET fault. */ SWAPGS + /* We have user CR3. Change to kernel CR3. */ + SWITCH_TO_KERNEL_CR3 scratch_reg=%rax .Lerror_entry_from_usermode_after_swapgs: /* Put us onto the real thread stack. */ @@ -1339,6 +1363,7 @@ ENTRY(error_entry) * .Lgs_change's error handler with kernel gsbase. */ SWAPGS + SWITCH_TO_KERNEL_CR3 scratch_reg=%rax jmp .Lerror_entry_done .Lbstep_iret: @@ -1348,10 +1373,11 @@ ENTRY(error_entry) .Lerror_bad_iret: /* - * We came from an IRET to user mode, so we have user gsbase. - * Switch to kernel gsbase: + * We came from an IRET to user mode, so we have user + * gsbase and CR3. Switch to kernel gsbase and CR3: */ SWAPGS + SWITCH_TO_KERNEL_CR3 scratch_reg=%rax /* * Pretend that the exception came from user mode: set up pt_regs @@ -1383,6 +1409,10 @@ END(error_exit) /* * Runs on exception stack. Xen PV does not go through this path at all, * so we can use real assembly here. + * + * Registers: + * %r14: Used to save/restore the CR3 of the interrupted context + * when PAGE_TABLE_ISOLATION is in use. Do not clobber. */ ENTRY(nmi) UNWIND_HINT_IRET_REGS @@ -1446,6 +1476,7 @@ ENTRY(nmi) swapgs cld + SWITCH_TO_KERNEL_CR3 scratch_reg=%rdx movq %rsp, %rdx movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp UNWIND_HINT_IRET_REGS base=%rdx offset=8 @@ -1698,6 +1729,8 @@ end_repeat_nmi: movq $-1, %rsi call do_nmi + RESTORE_CR3 save_reg=%r14 + testl %ebx, %ebx /* swapgs needed? */ jnz nmi_restore nmi_swapgs: diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 95ad40eb7eff..05238b29895e 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -49,6 +49,10 @@ ENTRY(entry_SYSENTER_compat) /* Interrupts are off on entry. */ SWAPGS + + /* We are about to clobber %rsp anyway, clobbering here is OK */ + SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp + movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp /* @@ -215,6 +219,12 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe) pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ + /* + * We just saved %rdi so it is safe to clobber. It is not + * preserved during the C calls inside TRACE_IRQS_OFF anyway. + */ + SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi + /* * User mode is traced as though IRQs are on, and SYSENTER * turned them off. @@ -256,10 +266,22 @@ sysret32_from_system_call: * when the system call started, which is already known to user * code. We zero R8-R10 to avoid info leaks. */ + movq RSP-ORIG_RAX(%rsp), %rsp + + /* + * The original userspace %rsp (RSP-ORIG_RAX(%rsp)) is stored + * on the process stack which is not mapped to userspace and + * not readable after we SWITCH_TO_USER_CR3. Delay the CR3 + * switch until after after the last reference to the process + * stack. + * + * %r8 is zeroed before the sysret, thus safe to clobber. + */ + SWITCH_TO_USER_CR3 scratch_reg=%r8 + xorq %r8, %r8 xorq %r9, %r9 xorq %r10, %r10 - movq RSP-ORIG_RAX(%rsp), %rsp swapgs sysretl END(entry_SYSCALL_compat) From a4b07fb4e5a6aef3b87a6540cc04cf972525a723 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:36 +0100 Subject: [PATCH 0812/1013] x86/mm/pti: Add infrastructure for page table isolation commit aa8c6248f8c75acfd610fe15d8cae23cf70d9d09 upstream. Add the initial files for kernel page table isolation, with a minimal init function and the boot time detection for this misfeature. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- .../admin-guide/kernel-parameters.txt | 2 + arch/x86/boot/compressed/pagetable.c | 3 + arch/x86/entry/calling.h | 7 ++ arch/x86/include/asm/pti.h | 14 ++++ arch/x86/mm/Makefile | 7 +- arch/x86/mm/init.c | 2 + arch/x86/mm/pti.c | 84 +++++++++++++++++++ include/linux/pti.h | 11 +++ init/main.c | 3 + 9 files changed, 130 insertions(+), 3 deletions(-) create mode 100644 arch/x86/include/asm/pti.h create mode 100644 arch/x86/mm/pti.c create mode 100644 include/linux/pti.h diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 05496622b4ef..5dfd26265484 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2685,6 +2685,8 @@ steal time is computed, but won't influence scheduler behaviour + nopti [X86-64] Disable kernel page table isolation + nolapic [X86-32,APIC] Do not enable or use the local APIC. nolapic_timer [X86-32,APIC] Do not use the local APIC timer. diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c index 972319ff5b01..e691ff734cb5 100644 --- a/arch/x86/boot/compressed/pagetable.c +++ b/arch/x86/boot/compressed/pagetable.c @@ -23,6 +23,9 @@ */ #undef CONFIG_AMD_MEM_ENCRYPT +/* No PAGE_TABLE_ISOLATION support needed either: */ +#undef CONFIG_PAGE_TABLE_ISOLATION + #include "misc.h" /* These actually do the work of building the kernel identity maps. */ diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index a9d17a7686ab..3d3389a92c33 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -205,18 +205,23 @@ For 32-bit we have the following conventions - kernel is built with .endm .macro SWITCH_TO_KERNEL_CR3 scratch_reg:req + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI mov %cr3, \scratch_reg ADJUST_KERNEL_CR3 \scratch_reg mov \scratch_reg, %cr3 +.Lend_\@: .endm .macro SWITCH_TO_USER_CR3 scratch_reg:req + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI mov %cr3, \scratch_reg ADJUST_USER_CR3 \scratch_reg mov \scratch_reg, %cr3 +.Lend_\@: .endm .macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req + ALTERNATIVE "jmp .Ldone_\@", "", X86_FEATURE_PTI movq %cr3, \scratch_reg movq \scratch_reg, \save_reg /* @@ -233,11 +238,13 @@ For 32-bit we have the following conventions - kernel is built with .endm .macro RESTORE_CR3 save_reg:req + ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI /* * The CR3 write could be avoided when not changing its value, * but would require a CR3 read *and* a scratch register. */ movq \save_reg, %cr3 +.Lend_\@: .endm #else /* CONFIG_PAGE_TABLE_ISOLATION=n: */ diff --git a/arch/x86/include/asm/pti.h b/arch/x86/include/asm/pti.h new file mode 100644 index 000000000000..0b5ef05b2d2d --- /dev/null +++ b/arch/x86/include/asm/pti.h @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _ASM_X86_PTI_H +#define _ASM_X86_PTI_H +#ifndef __ASSEMBLY__ + +#ifdef CONFIG_PAGE_TABLE_ISOLATION +extern void pti_init(void); +extern void pti_check_boottime_disable(void); +#else +static inline void pti_check_boottime_disable(void) { } +#endif + +#endif /* __ASSEMBLY__ */ +#endif /* _ASM_X86_PTI_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 2e0017af8f9b..52906808e277 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -43,9 +43,10 @@ obj-$(CONFIG_AMD_NUMA) += amdtopology.o obj-$(CONFIG_ACPI_NUMA) += srat.o obj-$(CONFIG_NUMA_EMU) += numa_emulation.o -obj-$(CONFIG_X86_INTEL_MPX) += mpx.o -obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o -obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o +obj-$(CONFIG_X86_INTEL_MPX) += mpx.o +obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o +obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o +obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_boot.o diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 020223420308..af75069fb116 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -20,6 +20,7 @@ #include #include #include +#include /* * We need to define the tracepoints somewhere, and tlb.c @@ -630,6 +631,7 @@ void __init init_mem_mapping(void) { unsigned long end; + pti_check_boottime_disable(); probe_page_size_mask(); setup_pcid(); diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c new file mode 100644 index 000000000000..375f23a758bc --- /dev/null +++ b/arch/x86/mm/pti.c @@ -0,0 +1,84 @@ +/* + * Copyright(c) 2017 Intel Corporation. All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of version 2 of the GNU General Public License as + * published by the Free Software Foundation. + * + * This program is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * General Public License for more details. + * + * This code is based in part on work published here: + * + * https://github.com/IAIK/KAISER + * + * The original work was written by and and signed off by for the Linux + * kernel by: + * + * Signed-off-by: Richard Fellner + * Signed-off-by: Moritz Lipp + * Signed-off-by: Daniel Gruss + * Signed-off-by: Michael Schwarz + * + * Major changes to the original code by: Dave Hansen + * Mostly rewritten by Thomas Gleixner and + * Andy Lutomirsky + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#undef pr_fmt +#define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt + +static void __init pti_print_if_insecure(const char *reason) +{ + if (boot_cpu_has_bug(X86_BUG_CPU_INSECURE)) + pr_info("%s\n", reason); +} + +void __init pti_check_boottime_disable(void) +{ + if (hypervisor_is_type(X86_HYPER_XEN_PV)) { + pti_print_if_insecure("disabled on XEN PV."); + return; + } + + if (cmdline_find_option_bool(boot_command_line, "nopti")) { + pti_print_if_insecure("disabled on command line."); + return; + } + + if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE)) + return; + + setup_force_cpu_cap(X86_FEATURE_PTI); +} + +/* + * Initialize kernel page table isolation + */ +void __init pti_init(void) +{ + if (!static_cpu_has(X86_FEATURE_PTI)) + return; + + pr_info("enabled\n"); +} diff --git a/include/linux/pti.h b/include/linux/pti.h new file mode 100644 index 000000000000..0174883a935a --- /dev/null +++ b/include/linux/pti.h @@ -0,0 +1,11 @@ +// SPDX-License-Identifier: GPL-2.0 +#ifndef _INCLUDE_PTI_H +#define _INCLUDE_PTI_H + +#ifdef CONFIG_PAGE_TABLE_ISOLATION +#include +#else +static inline void pti_init(void) { } +#endif + +#endif diff --git a/init/main.c b/init/main.c index 8a390f60ec81..b32ec72cdf3d 100644 --- a/init/main.c +++ b/init/main.c @@ -75,6 +75,7 @@ #include #include #include +#include #include #include #include @@ -506,6 +507,8 @@ static void __init mm_init(void) ioremap_huge_init(); /* Should be run before the first non-init thread is created */ init_espfix_bsp(); + /* Should be run after espfix64 is set up. */ + pti_init(); } asmlinkage __visible void __init start_kernel(void) From 8a2533407f4d60a43effadf7b62825d213cd678c Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Tue, 12 Dec 2017 14:39:52 +0100 Subject: [PATCH 0813/1013] x86/pti: Add the pti= cmdline option and documentation commit 41f4c20b57a4890ea7f56ff8717cc83fefb8d537 upstream. Keep the "nopti" optional for traditional reasons. [ tglx: Don't allow force on when running on XEN PV and made 'on' printout conditional ] Requested-by: Linus Torvalds Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Andy Lutomirsky Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Link: https://lkml.kernel.org/r/20171212133952.10177-1-bp@alien8.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- .../admin-guide/kernel-parameters.txt | 6 +++++ arch/x86/mm/pti.c | 26 ++++++++++++++++++- 2 files changed, 31 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 5dfd26265484..520fdec15bbb 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3255,6 +3255,12 @@ pt. [PARIDE] See Documentation/blockdev/paride.txt. + pti= [X86_64] + Control user/kernel address space isolation: + on - enable + off - disable + auto - default setting + pty.legacy_count= [KNL] Number of legacy pty's. Overwrites compiled-in default number. diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 375f23a758bc..a13f6b109865 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -54,21 +54,45 @@ static void __init pti_print_if_insecure(const char *reason) pr_info("%s\n", reason); } +static void __init pti_print_if_secure(const char *reason) +{ + if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE)) + pr_info("%s\n", reason); +} + void __init pti_check_boottime_disable(void) { + char arg[5]; + int ret; + if (hypervisor_is_type(X86_HYPER_XEN_PV)) { pti_print_if_insecure("disabled on XEN PV."); return; } + ret = cmdline_find_option(boot_command_line, "pti", arg, sizeof(arg)); + if (ret > 0) { + if (ret == 3 && !strncmp(arg, "off", 3)) { + pti_print_if_insecure("disabled on command line."); + return; + } + if (ret == 2 && !strncmp(arg, "on", 2)) { + pti_print_if_secure("force enabled on command line."); + goto enable; + } + if (ret == 4 && !strncmp(arg, "auto", 4)) + goto autosel; + } + if (cmdline_find_option_bool(boot_command_line, "nopti")) { pti_print_if_insecure("disabled on command line."); return; } +autosel: if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE)) return; - +enable: setup_force_cpu_cap(X86_FEATURE_PTI); } From b9feab7dcf86df222a405df3e0d95b85741a2d73 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:37 +0100 Subject: [PATCH 0814/1013] x86/mm/pti: Add mapping helper functions commit 61e9b3671007a5da8127955a1a3bda7e0d5f42e8 upstream. Add the pagetable helper functions do manage the separate user space page tables. [ tglx: Split out from the big combo kaiser patch. Folded Andys simplification and made it out of line as Boris suggested ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 6 +- arch/x86/include/asm/pgtable_64.h | 92 +++++++++++++++++++++++++++++++ arch/x86/mm/pti.c | 41 ++++++++++++++ 3 files changed, 138 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f02de8bc1f72..99b10d187db1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -909,7 +909,11 @@ static inline int pgd_none(pgd_t pgd) * pgd_offset() returns a (pgd_t *) * pgd_index() is used get the offset into the pgd page's array of pgd_t's; */ -#define pgd_offset(mm, address) ((mm)->pgd + pgd_index((address))) +#define pgd_offset_pgd(pgd, address) (pgd + pgd_index((address))) +/* + * a shortcut to get a pgd_t in a given mm + */ +#define pgd_offset(mm, address) pgd_offset_pgd((mm)->pgd, (address)) /* * a shortcut which implies the use of the kernel's pgd, instead * of a process's diff --git a/arch/x86/include/asm/pgtable_64.h b/arch/x86/include/asm/pgtable_64.h index e9f05331e732..81462e9a34f6 100644 --- a/arch/x86/include/asm/pgtable_64.h +++ b/arch/x86/include/asm/pgtable_64.h @@ -131,9 +131,97 @@ static inline pud_t native_pudp_get_and_clear(pud_t *xp) #endif } +#ifdef CONFIG_PAGE_TABLE_ISOLATION +/* + * All top-level PAGE_TABLE_ISOLATION page tables are order-1 pages + * (8k-aligned and 8k in size). The kernel one is at the beginning 4k and + * the user one is in the last 4k. To switch between them, you + * just need to flip the 12th bit in their addresses. + */ +#define PTI_PGTABLE_SWITCH_BIT PAGE_SHIFT + +/* + * This generates better code than the inline assembly in + * __set_bit(). + */ +static inline void *ptr_set_bit(void *ptr, int bit) +{ + unsigned long __ptr = (unsigned long)ptr; + + __ptr |= BIT(bit); + return (void *)__ptr; +} +static inline void *ptr_clear_bit(void *ptr, int bit) +{ + unsigned long __ptr = (unsigned long)ptr; + + __ptr &= ~BIT(bit); + return (void *)__ptr; +} + +static inline pgd_t *kernel_to_user_pgdp(pgd_t *pgdp) +{ + return ptr_set_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline pgd_t *user_to_kernel_pgdp(pgd_t *pgdp) +{ + return ptr_clear_bit(pgdp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline p4d_t *kernel_to_user_p4dp(p4d_t *p4dp) +{ + return ptr_set_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); +} + +static inline p4d_t *user_to_kernel_p4dp(p4d_t *p4dp) +{ + return ptr_clear_bit(p4dp, PTI_PGTABLE_SWITCH_BIT); +} +#endif /* CONFIG_PAGE_TABLE_ISOLATION */ + +/* + * Page table pages are page-aligned. The lower half of the top + * level is used for userspace and the top half for the kernel. + * + * Returns true for parts of the PGD that map userspace and + * false for the parts that map the kernel. + */ +static inline bool pgdp_maps_userspace(void *__ptr) +{ + unsigned long ptr = (unsigned long)__ptr; + + return (ptr & ~PAGE_MASK) < (PAGE_SIZE / 2); +} + +#ifdef CONFIG_PAGE_TABLE_ISOLATION +pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd); + +/* + * Take a PGD location (pgdp) and a pgd value that needs to be set there. + * Populates the user and returns the resulting PGD that must be set in + * the kernel copy of the page tables. + */ +static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) +{ + if (!static_cpu_has(X86_FEATURE_PTI)) + return pgd; + return __pti_set_user_pgd(pgdp, pgd); +} +#else +static inline pgd_t pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) +{ + return pgd; +} +#endif + static inline void native_set_p4d(p4d_t *p4dp, p4d_t p4d) { +#if defined(CONFIG_PAGE_TABLE_ISOLATION) && !defined(CONFIG_X86_5LEVEL) + p4dp->pgd = pti_set_user_pgd(&p4dp->pgd, p4d.pgd); +#else *p4dp = p4d; +#endif } static inline void native_p4d_clear(p4d_t *p4d) @@ -147,7 +235,11 @@ static inline void native_p4d_clear(p4d_t *p4d) static inline void native_set_pgd(pgd_t *pgdp, pgd_t pgd) { +#ifdef CONFIG_PAGE_TABLE_ISOLATION + *pgdp = pti_set_user_pgd(pgdp, pgd); +#else *pgdp = pgd; +#endif } static inline void native_pgd_clear(pgd_t *pgd) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index a13f6b109865..69a983365392 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -96,6 +96,47 @@ void __init pti_check_boottime_disable(void) setup_force_cpu_cap(X86_FEATURE_PTI); } +pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) +{ + /* + * Changes to the high (kernel) portion of the kernelmode page + * tables are not automatically propagated to the usermode tables. + * + * Users should keep in mind that, unlike the kernelmode tables, + * there is no vmalloc_fault equivalent for the usermode tables. + * Top-level entries added to init_mm's usermode pgd after boot + * will not be automatically propagated to other mms. + */ + if (!pgdp_maps_userspace(pgdp)) + return pgd; + + /* + * The user page tables get the full PGD, accessible from + * userspace: + */ + kernel_to_user_pgdp(pgdp)->pgd = pgd.pgd; + + /* + * If this is normal user memory, make it NX in the kernel + * pagetables so that, if we somehow screw up and return to + * usermode with the kernel CR3 loaded, we'll get a page fault + * instead of allowing user code to execute with the wrong CR3. + * + * As exceptions, we don't set NX if: + * - _PAGE_USER is not set. This could be an executable + * EFI runtime mapping or something similar, and the kernel + * may execute from it + * - we don't have NX support + * - we're clearing the PGD (i.e. the new pgd is not present). + */ + if ((pgd.pgd & (_PAGE_USER|_PAGE_PRESENT)) == (_PAGE_USER|_PAGE_PRESENT) && + (__supported_pte_mask & _PAGE_NX)) + pgd.pgd |= _PAGE_NX; + + /* return the copy of the PGD we want the kernel to use: */ + return pgd; +} + /* * Initialize kernel page table isolation */ From ffcb80ad79e8c3d87b43f2c9ee4b9170c7ec12ea Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:38 +0100 Subject: [PATCH 0815/1013] x86/mm/pti: Allow NX poison to be set in p4d/pgd commit 1c4de1ff4fe50453b968579ee86fac3da80dd783 upstream. With PAGE_TABLE_ISOLATION the user portion of the kernel page tables is poisoned with the NX bit so if the entry code exits with the kernel page tables selected in CR3, userspace crashes. But doing so trips the p4d/pgd_bad() checks. Make sure it does not do that. Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 99b10d187db1..f4c42c6d350c 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -846,7 +846,12 @@ static inline pud_t *pud_offset(p4d_t *p4d, unsigned long address) static inline int p4d_bad(p4d_t p4d) { - return (p4d_flags(p4d) & ~(_KERNPG_TABLE | _PAGE_USER)) != 0; + unsigned long ignore_flags = _KERNPG_TABLE | _PAGE_USER; + + if (IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION)) + ignore_flags |= _PAGE_NX; + + return (p4d_flags(p4d) & ~ignore_flags) != 0; } #endif /* CONFIG_PGTABLE_LEVELS > 3 */ @@ -880,7 +885,12 @@ static inline p4d_t *p4d_offset(pgd_t *pgd, unsigned long address) static inline int pgd_bad(pgd_t pgd) { - return (pgd_flags(pgd) & ~_PAGE_USER) != _KERNPG_TABLE; + unsigned long ignore_flags = _PAGE_USER; + + if (IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION)) + ignore_flags |= _PAGE_NX; + + return (pgd_flags(pgd) & ~ignore_flags) != _KERNPG_TABLE; } static inline int pgd_none(pgd_t pgd) From 61fd4049e6760eb8832d4e0bec592d8a810b270f Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:39 +0100 Subject: [PATCH 0816/1013] x86/mm/pti: Allocate a separate user PGD commit d9e9a6418065bb376e5de8d93ce346939b9a37a6 upstream. Kernel page table isolation requires to have two PGDs. One for the kernel, which contains the full kernel mapping plus the user space mapping and one for user space which contains the user space mappings and the minimal set of kernel mappings which are required by the architecture to be able to transition from and to user space. Add the necessary preliminaries. [ tglx: Split out from the big kaiser dump. EFI fixup from Kirill ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgalloc.h | 11 +++++++++++ arch/x86/kernel/head_64.S | 30 +++++++++++++++++++++++++++--- arch/x86/mm/pgtable.c | 5 +++-- arch/x86/platform/efi/efi_64.c | 5 ++++- 4 files changed, 45 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h index 4b5e1eafada7..aff42e1da6ee 100644 --- a/arch/x86/include/asm/pgalloc.h +++ b/arch/x86/include/asm/pgalloc.h @@ -30,6 +30,17 @@ static inline void paravirt_release_p4d(unsigned long pfn) {} */ extern gfp_t __userpte_alloc_gfp; +#ifdef CONFIG_PAGE_TABLE_ISOLATION +/* + * Instead of one PGD, we acquire two PGDs. Being order-1, it is + * both 8k in size and 8k-aligned. That lets us just flip bit 12 + * in a pointer to swap between the two 4k halves. + */ +#define PGD_ALLOCATION_ORDER 1 +#else +#define PGD_ALLOCATION_ORDER 0 +#endif + /* * Allocate and free page tables. */ diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 7dca675fe78d..04a625f0fcda 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -341,6 +341,27 @@ GLOBAL(early_recursion_flag) .balign PAGE_SIZE; \ GLOBAL(name) +#ifdef CONFIG_PAGE_TABLE_ISOLATION +/* + * Each PGD needs to be 8k long and 8k aligned. We do not + * ever go out to userspace with these, so we do not + * strictly *need* the second page, but this allows us to + * have a single set_pgd() implementation that does not + * need to worry about whether it has 4k or 8k to work + * with. + * + * This ensures PGDs are 8k long: + */ +#define PTI_USER_PGD_FILL 512 +/* This ensures they are 8k-aligned: */ +#define NEXT_PGD_PAGE(name) \ + .balign 2 * PAGE_SIZE; \ +GLOBAL(name) +#else +#define NEXT_PGD_PAGE(name) NEXT_PAGE(name) +#define PTI_USER_PGD_FILL 0 +#endif + /* Automate the creation of 1 to 1 mapping pmd entries */ #define PMDS(START, PERM, COUNT) \ i = 0 ; \ @@ -350,13 +371,14 @@ GLOBAL(name) .endr __INITDATA -NEXT_PAGE(early_top_pgt) +NEXT_PGD_PAGE(early_top_pgt) .fill 511,8,0 #ifdef CONFIG_X86_5LEVEL .quad level4_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC #else .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC #endif + .fill PTI_USER_PGD_FILL,8,0 NEXT_PAGE(early_dynamic_pgts) .fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0 @@ -364,13 +386,14 @@ NEXT_PAGE(early_dynamic_pgts) .data #if defined(CONFIG_XEN_PV) || defined(CONFIG_XEN_PVH) -NEXT_PAGE(init_top_pgt) +NEXT_PGD_PAGE(init_top_pgt) .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .org init_top_pgt + PGD_PAGE_OFFSET*8, 0 .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC .org init_top_pgt + PGD_START_KERNEL*8, 0 /* (2^48-(2*1024*1024*1024))/(2^39) = 511 */ .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NOENC + .fill PTI_USER_PGD_FILL,8,0 NEXT_PAGE(level3_ident_pgt) .quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NOENC @@ -381,8 +404,9 @@ NEXT_PAGE(level2_ident_pgt) */ PMDS(0, __PAGE_KERNEL_IDENT_LARGE_EXEC, PTRS_PER_PMD) #else -NEXT_PAGE(init_top_pgt) +NEXT_PGD_PAGE(init_top_pgt) .fill 512,8,0 + .fill PTI_USER_PGD_FILL,8,0 #endif #ifdef CONFIG_X86_5LEVEL diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 17ebc5a978cc..9b7bcbd33cc2 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -355,14 +355,15 @@ static inline void _pgd_free(pgd_t *pgd) kmem_cache_free(pgd_cache, pgd); } #else + static inline pgd_t *_pgd_alloc(void) { - return (pgd_t *)__get_free_page(PGALLOC_GFP); + return (pgd_t *)__get_free_pages(PGALLOC_GFP, PGD_ALLOCATION_ORDER); } static inline void _pgd_free(pgd_t *pgd) { - free_page((unsigned long)pgd); + free_pages((unsigned long)pgd, PGD_ALLOCATION_ORDER); } #endif /* CONFIG_X86_PAE */ diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c index 20fb31579b69..39c4b35ac7a4 100644 --- a/arch/x86/platform/efi/efi_64.c +++ b/arch/x86/platform/efi/efi_64.c @@ -195,6 +195,9 @@ static pgd_t *efi_pgd; * because we want to avoid inserting EFI region mappings (EFI_VA_END * to EFI_VA_START) into the standard kernel page tables. Everything * else can be shared, see efi_sync_low_kernel_mappings(). + * + * We don't want the pgd on the pgd_list and cannot use pgd_alloc() for the + * allocation. */ int __init efi_alloc_page_tables(void) { @@ -207,7 +210,7 @@ int __init efi_alloc_page_tables(void) return 0; gfp_mask = GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO; - efi_pgd = (pgd_t *)__get_free_page(gfp_mask); + efi_pgd = (pgd_t *)__get_free_pages(gfp_mask, PGD_ALLOCATION_ORDER); if (!efi_pgd) return -ENOMEM; From 1bcd98df0f50b28a5fb40eeb2cb7a17a02820232 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:40 +0100 Subject: [PATCH 0817/1013] x86/mm/pti: Populate user PGD commit fc2fbc8512ed08d1de7720936fd7d2e4ce02c3a2 upstream. In clone_pgd_range() copy the init user PGDs which cover the kernel half of the address space, so a process has all the required kernel mappings visible. [ tglx: Split out from the big kaiser dump and folded Andys simplification ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index f4c42c6d350c..9aaeea687834 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1125,7 +1125,14 @@ static inline int pud_write(pud_t pud) */ static inline void clone_pgd_range(pgd_t *dst, pgd_t *src, int count) { - memcpy(dst, src, count * sizeof(pgd_t)); + memcpy(dst, src, count * sizeof(pgd_t)); +#ifdef CONFIG_PAGE_TABLE_ISOLATION + if (!static_cpu_has(X86_FEATURE_PTI)) + return; + /* Clone the user space pgd as well */ + memcpy(kernel_to_user_pgdp(dst), kernel_to_user_pgdp(src), + count * sizeof(pgd_t)); +#endif } #define PTE_SHIFT ilog2(PTRS_PER_PTE) From 9f006b0247234e6448e5fd04b5bcb0c56c968698 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:42 +0100 Subject: [PATCH 0818/1013] x86/mm/pti: Add functions to clone kernel PMDs commit 03f4424f348e8be95eb1bbeba09461cd7b867828 upstream. Provide infrastructure to: - find a kernel PMD for a mapping which must be visible to user space for the entry/exit code to work. - walk an address range and share the kernel PMD with it. This reuses a small part of the original KAISER patches to populate the user space page table. [ tglx: Made it universally usable so it can be used for any kind of shared mapping. Add a mechanism to clear specific bits in the user space visible PMD entry. Folded Andys simplifactions ] Originally-by: Dave Hansen Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/pti.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 127 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 69a983365392..d58bcee470fc 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -48,6 +48,11 @@ #undef pr_fmt #define pr_fmt(fmt) "Kernel/User page tables isolation: " fmt +/* Backporting helper */ +#ifndef __GFP_NOTRACK +#define __GFP_NOTRACK 0 +#endif + static void __init pti_print_if_insecure(const char *reason) { if (boot_cpu_has_bug(X86_BUG_CPU_INSECURE)) @@ -137,6 +142,128 @@ pgd_t __pti_set_user_pgd(pgd_t *pgdp, pgd_t pgd) return pgd; } +/* + * Walk the user copy of the page tables (optionally) trying to allocate + * page table pages on the way down. + * + * Returns a pointer to a P4D on success, or NULL on failure. + */ +static p4d_t *pti_user_pagetable_walk_p4d(unsigned long address) +{ + pgd_t *pgd = kernel_to_user_pgdp(pgd_offset_k(address)); + gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO); + + if (address < PAGE_OFFSET) { + WARN_ONCE(1, "attempt to walk user address\n"); + return NULL; + } + + if (pgd_none(*pgd)) { + unsigned long new_p4d_page = __get_free_page(gfp); + if (!new_p4d_page) + return NULL; + + if (pgd_none(*pgd)) { + set_pgd(pgd, __pgd(_KERNPG_TABLE | __pa(new_p4d_page))); + new_p4d_page = 0; + } + if (new_p4d_page) + free_page(new_p4d_page); + } + BUILD_BUG_ON(pgd_large(*pgd) != 0); + + return p4d_offset(pgd, address); +} + +/* + * Walk the user copy of the page tables (optionally) trying to allocate + * page table pages on the way down. + * + * Returns a pointer to a PMD on success, or NULL on failure. + */ +static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address) +{ + gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO); + p4d_t *p4d = pti_user_pagetable_walk_p4d(address); + pud_t *pud; + + BUILD_BUG_ON(p4d_large(*p4d) != 0); + if (p4d_none(*p4d)) { + unsigned long new_pud_page = __get_free_page(gfp); + if (!new_pud_page) + return NULL; + + if (p4d_none(*p4d)) { + set_p4d(p4d, __p4d(_KERNPG_TABLE | __pa(new_pud_page))); + new_pud_page = 0; + } + if (new_pud_page) + free_page(new_pud_page); + } + + pud = pud_offset(p4d, address); + /* The user page tables do not use large mappings: */ + if (pud_large(*pud)) { + WARN_ON(1); + return NULL; + } + if (pud_none(*pud)) { + unsigned long new_pmd_page = __get_free_page(gfp); + if (!new_pmd_page) + return NULL; + + if (pud_none(*pud)) { + set_pud(pud, __pud(_KERNPG_TABLE | __pa(new_pmd_page))); + new_pmd_page = 0; + } + if (new_pmd_page) + free_page(new_pmd_page); + } + + return pmd_offset(pud, address); +} + +static void __init +pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear) +{ + unsigned long addr; + + /* + * Clone the populated PMDs which cover start to end. These PMD areas + * can have holes. + */ + for (addr = start; addr < end; addr += PMD_SIZE) { + pmd_t *pmd, *target_pmd; + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + + pgd = pgd_offset_k(addr); + if (WARN_ON(pgd_none(*pgd))) + return; + p4d = p4d_offset(pgd, addr); + if (WARN_ON(p4d_none(*p4d))) + return; + pud = pud_offset(p4d, addr); + if (pud_none(*pud)) + continue; + pmd = pmd_offset(pud, addr); + if (pmd_none(*pmd)) + continue; + + target_pmd = pti_user_pagetable_walk_pmd(addr); + if (WARN_ON(!target_pmd)) + return; + + /* + * Copy the PMD. That is, the kernelmode and usermode + * tables will share the last-level page tables of this + * address range + */ + *target_pmd = pmd_clear_flags(*pmd, clear); + } +} + /* * Initialize kernel page table isolation */ From 35531133abf37ea2f00673a0953e397c286f7c7c Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:43 +0100 Subject: [PATCH 0819/1013] x86/mm/pti: Force entry through trampoline when PTI active commit 8d4b067895791ab9fdb1aadfc505f64d71239dd2 upstream. Force the entry through the trampoline only when PTI is active. Otherwise go through the normal entry code. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/common.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index a9210f9b7cf8..f2a94dfb434e 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1339,7 +1339,10 @@ void syscall_init(void) (entry_SYSCALL_64_trampoline - _entry_trampoline); wrmsr(MSR_STAR, 0, (__USER32_CS << 16) | __KERNEL_CS); - wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline); + if (static_cpu_has(X86_FEATURE_PTI)) + wrmsrl(MSR_LSTAR, SYSCALL64_entry_trampoline); + else + wrmsrl(MSR_LSTAR, (unsigned long)entry_SYSCALL_64); #ifdef CONFIG_IA32_EMULATION wrmsrl(MSR_CSTAR, (unsigned long)entry_SYSCALL_compat); From fb9dfabb6e803801b9e88ee6c0d56d3b54531b95 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Mon, 4 Dec 2017 15:07:45 +0100 Subject: [PATCH 0820/1013] x86/mm/pti: Share cpu_entry_area with user space page tables commit f7cfbee91559ca7e3e961a00ffac921208a115ad upstream. Share the cpu entry area so the user space and kernel space page tables have the same P4D page. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/pti.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index d58bcee470fc..59290356f19f 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -264,6 +264,29 @@ pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear) } } +/* + * Clone a single p4d (i.e. a top-level entry on 4-level systems and a + * next-level entry on 5-level systems. + */ +static void __init pti_clone_p4d(unsigned long addr) +{ + p4d_t *kernel_p4d, *user_p4d; + pgd_t *kernel_pgd; + + user_p4d = pti_user_pagetable_walk_p4d(addr); + kernel_pgd = pgd_offset_k(addr); + kernel_p4d = p4d_offset(kernel_pgd, addr); + *user_p4d = *kernel_p4d; +} + +/* + * Clone the CPU_ENTRY_AREA into the user space visible page table. + */ +static void __init pti_clone_user_shared(void) +{ + pti_clone_p4d(CPU_ENTRY_AREA_BASE); +} + /* * Initialize kernel page table isolation */ @@ -273,4 +296,6 @@ void __init pti_init(void) return; pr_info("enabled\n"); + + pti_clone_user_shared(); } From 088baf5de12eb7660e20f3f4efc1cf270acff5f4 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:46 +0100 Subject: [PATCH 0821/1013] x86/entry: Align entry text section to PMD boundary commit 2f7412ba9c6af5ab16bdbb4a3fdb1dcd2b4fd3c2 upstream. The (irq)entry text must be visible in the user space page tables. To allow simple PMD based sharing, make the entry text PMD aligned. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/vmlinux.lds.S | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index d2a8b5a24a44..1e413a9326aa 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -61,11 +61,17 @@ jiffies_64 = jiffies; . = ALIGN(HPAGE_SIZE); \ __end_rodata_hpage_align = .; +#define ALIGN_ENTRY_TEXT_BEGIN . = ALIGN(PMD_SIZE); +#define ALIGN_ENTRY_TEXT_END . = ALIGN(PMD_SIZE); + #else #define X64_ALIGN_RODATA_BEGIN #define X64_ALIGN_RODATA_END +#define ALIGN_ENTRY_TEXT_BEGIN +#define ALIGN_ENTRY_TEXT_END + #endif PHDRS { @@ -102,8 +108,10 @@ SECTIONS CPUIDLE_TEXT LOCK_TEXT KPROBES_TEXT + ALIGN_ENTRY_TEXT_BEGIN ENTRY_TEXT IRQENTRY_TEXT + ALIGN_ENTRY_TEXT_END SOFTIRQENTRY_TEXT *(.fixup) *(.gnu.warning) From e08aa2f1988a7d4ded9c9674fe18857ee5c6fc53 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:47 +0100 Subject: [PATCH 0822/1013] x86/mm/pti: Share entry text PMD commit 6dc72c3cbca0580642808d677181cad4c6433893 upstream. Share the entry text PMD of the kernel mapping with the user space mapping. If large pages are enabled this is a single PMD entry and at the point where it is copied into the user page table the RW bit has not been cleared yet. Clear it right away so the user space visible map becomes RX. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/pti.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 59290356f19f..0e78797650a7 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -287,6 +287,15 @@ static void __init pti_clone_user_shared(void) pti_clone_p4d(CPU_ENTRY_AREA_BASE); } +/* + * Clone the populated PMDs of the entry and irqentry text and force it RO. + */ +static void __init pti_clone_entry_text(void) +{ + pti_clone_pmds((unsigned long) __entry_text_start, + (unsigned long) __irqentry_text_end, _PAGE_RW); +} + /* * Initialize kernel page table isolation */ @@ -298,4 +307,5 @@ void __init pti_init(void) pr_info("enabled\n"); pti_clone_user_shared(); + pti_clone_entry_text(); } From d230c1917f57c3beee2e0204a4c8c58999758b95 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Fri, 15 Dec 2017 22:08:18 +0100 Subject: [PATCH 0823/1013] x86/mm/pti: Map ESPFIX into user space commit 4b6bbe95b87966ba08999574db65c93c5e925a36 upstream. Map the ESPFIX pages into user space when PTI is enabled. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/pti.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 0e78797650a7..b1c38ef9fbbb 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -287,6 +287,16 @@ static void __init pti_clone_user_shared(void) pti_clone_p4d(CPU_ENTRY_AREA_BASE); } +/* + * Clone the ESPFIX P4D into the user space visinble page table + */ +static void __init pti_setup_espfix64(void) +{ +#ifdef CONFIG_X86_ESPFIX64 + pti_clone_p4d(ESPFIX_BASE_ADDR); +#endif +} + /* * Clone the populated PMDs of the entry and irqentry text and force it RO. */ @@ -308,4 +318,5 @@ void __init pti_init(void) pti_clone_user_shared(); pti_clone_entry_text(); + pti_setup_espfix64(); } From e0eb34665d2eecddfa7a1810b76fae52313c1286 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:07:49 +0100 Subject: [PATCH 0824/1013] x86/cpu_entry_area: Add debugstore entries to cpu_entry_area commit 10043e02db7f8a4161f76434931051e7d797a5f6 upstream. The Intel PEBS/BTS debug store is a design trainwreck as it expects virtual addresses which must be visible in any execution context. So it is required to make these mappings visible to user space when kernel page table isolation is active. Provide enough room for the buffer mappings in the cpu_entry_area so the buffers are available in the user space visible page tables. At the point where the kernel side entry area is populated there is no buffer available yet, but the kernel PMD must be populated. To achieve this set the entries for these buffers to non present. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/intel/ds.c | 5 ++-- arch/x86/events/perf_event.h | 21 ++-------------- arch/x86/include/asm/cpu_entry_area.h | 13 ++++++++++ arch/x86/include/asm/intel_ds.h | 36 +++++++++++++++++++++++++++ arch/x86/mm/cpu_entry_area.c | 27 ++++++++++++++++++++ 5 files changed, 81 insertions(+), 21 deletions(-) create mode 100644 arch/x86/include/asm/intel_ds.h diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 3674a4b6f8bd..6522f0279cb8 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -8,11 +8,12 @@ #include "../perf_event.h" +/* Waste a full page so it can be mapped into the cpu_entry_area */ +DEFINE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store); + /* The size of a BTS record in bytes: */ #define BTS_RECORD_SIZE 24 -#define BTS_BUFFER_SIZE (PAGE_SIZE << 4) -#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4) #define PEBS_FIXUP_SIZE PAGE_SIZE /* diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index f7aaadf9331f..373f9eda80b1 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -14,6 +14,8 @@ #include +#include + /* To enable MSR tracing please use the generic trace points. */ /* @@ -77,8 +79,6 @@ struct amd_nb { struct event_constraint event_constraints[X86_PMC_IDX_MAX]; }; -/* The maximal number of PEBS events: */ -#define MAX_PEBS_EVENTS 8 #define PEBS_COUNTER_MASK ((1ULL << MAX_PEBS_EVENTS) - 1) /* @@ -95,23 +95,6 @@ struct amd_nb { PERF_SAMPLE_TRANSACTION | PERF_SAMPLE_PHYS_ADDR | \ PERF_SAMPLE_REGS_INTR | PERF_SAMPLE_REGS_USER) -/* - * A debug store configuration. - * - * We only support architectures that use 64bit fields. - */ -struct debug_store { - u64 bts_buffer_base; - u64 bts_index; - u64 bts_absolute_maximum; - u64 bts_interrupt_threshold; - u64 pebs_buffer_base; - u64 pebs_index; - u64 pebs_absolute_maximum; - u64 pebs_interrupt_threshold; - u64 pebs_event_reset[MAX_PEBS_EVENTS]; -}; - #define PEBS_REGS \ (PERF_REG_X86_AX | \ PERF_REG_X86_BX | \ diff --git a/arch/x86/include/asm/cpu_entry_area.h b/arch/x86/include/asm/cpu_entry_area.h index 2fbc69a0916e..4a7884b8dca5 100644 --- a/arch/x86/include/asm/cpu_entry_area.h +++ b/arch/x86/include/asm/cpu_entry_area.h @@ -5,6 +5,7 @@ #include #include +#include /* * cpu_entry_area is a percpu region that contains things needed by the CPU @@ -40,6 +41,18 @@ struct cpu_entry_area { */ char exception_stacks[(N_EXCEPTION_STACKS - 1) * EXCEPTION_STKSZ + DEBUG_STKSZ]; #endif +#ifdef CONFIG_CPU_SUP_INTEL + /* + * Per CPU debug store for Intel performance monitoring. Wastes a + * full page at the moment. + */ + struct debug_store cpu_debug_store; + /* + * The actual PEBS/BTS buffers must be mapped to user space + * Reserve enough fixmap PTEs. + */ + struct debug_store_buffers cpu_debug_buffers; +#endif }; #define CPU_ENTRY_AREA_SIZE (sizeof(struct cpu_entry_area)) diff --git a/arch/x86/include/asm/intel_ds.h b/arch/x86/include/asm/intel_ds.h new file mode 100644 index 000000000000..62a9f4966b42 --- /dev/null +++ b/arch/x86/include/asm/intel_ds.h @@ -0,0 +1,36 @@ +#ifndef _ASM_INTEL_DS_H +#define _ASM_INTEL_DS_H + +#include + +#define BTS_BUFFER_SIZE (PAGE_SIZE << 4) +#define PEBS_BUFFER_SIZE (PAGE_SIZE << 4) + +/* The maximal number of PEBS events: */ +#define MAX_PEBS_EVENTS 8 + +/* + * A debug store configuration. + * + * We only support architectures that use 64bit fields. + */ +struct debug_store { + u64 bts_buffer_base; + u64 bts_index; + u64 bts_absolute_maximum; + u64 bts_interrupt_threshold; + u64 pebs_buffer_base; + u64 pebs_index; + u64 pebs_absolute_maximum; + u64 pebs_interrupt_threshold; + u64 pebs_event_reset[MAX_PEBS_EVENTS]; +} __aligned(PAGE_SIZE); + +DECLARE_PER_CPU_PAGE_ALIGNED(struct debug_store, cpu_debug_store); + +struct debug_store_buffers { + char bts_buffer[BTS_BUFFER_SIZE]; + char pebs_buffer[PEBS_BUFFER_SIZE]; +}; + +#endif diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c index fe814fd5e014..b9283cc27622 100644 --- a/arch/x86/mm/cpu_entry_area.c +++ b/arch/x86/mm/cpu_entry_area.c @@ -38,6 +38,32 @@ cea_map_percpu_pages(void *cea_vaddr, void *ptr, int pages, pgprot_t prot) cea_set_pte(cea_vaddr, per_cpu_ptr_to_phys(ptr), prot); } +static void percpu_setup_debug_store(int cpu) +{ +#ifdef CONFIG_CPU_SUP_INTEL + int npages; + void *cea; + + if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + return; + + cea = &get_cpu_entry_area(cpu)->cpu_debug_store; + npages = sizeof(struct debug_store) / PAGE_SIZE; + BUILD_BUG_ON(sizeof(struct debug_store) % PAGE_SIZE != 0); + cea_map_percpu_pages(cea, &per_cpu(cpu_debug_store, cpu), npages, + PAGE_KERNEL); + + cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers; + /* + * Force the population of PMDs for not yet allocated per cpu + * memory like debug store buffers. + */ + npages = sizeof(struct debug_store_buffers) / PAGE_SIZE; + for (; npages; npages--, cea += PAGE_SIZE) + cea_set_pte(cea, 0, PAGE_NONE); +#endif +} + /* Setup the fixmap mappings only once per-processor */ static void __init setup_cpu_entry_area(int cpu) { @@ -109,6 +135,7 @@ static void __init setup_cpu_entry_area(int cpu) cea_set_pte(&get_cpu_entry_area(cpu)->entry_trampoline, __pa_symbol(_entry_trampoline), PAGE_KERNEL_RX); #endif + percpu_setup_debug_store(cpu); } static __init void setup_cpu_entry_area_ptes(void) From 8b82023b7fc2c3734aae23bc03ed5937e67f7388 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 4 Dec 2017 15:07:50 +0100 Subject: [PATCH 0825/1013] x86/events/intel/ds: Map debug buffers in cpu_entry_area commit c1961a4631daef4aeabee8e368b1b13e8f173c91 upstream. The BTS and PEBS buffers both have their virtual addresses programmed into the hardware. This means that any access to them is performed via the page tables. The times that the hardware accesses these are entirely dependent on how the performance monitoring hardware events are set up. In other words, there is no way for the kernel to tell when the hardware might access these buffers. To avoid perf crashes, place 'debug_store' allocate pages and map them into the cpu_entry_area. The PEBS fixup buffer does not need this treatment. [ tglx: Got rid of the kaiser_add_mapping() complication ] Signed-off-by: Hugh Dickins Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/intel/ds.c | 125 ++++++++++++++++++++++------------- arch/x86/events/perf_event.h | 2 + 2 files changed, 82 insertions(+), 45 deletions(-) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 6522f0279cb8..8f0aace08b87 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -3,6 +3,7 @@ #include #include +#include #include #include @@ -280,17 +281,52 @@ void fini_debug_store_on_cpu(int cpu) static DEFINE_PER_CPU(void *, insn_buffer); -static int alloc_pebs_buffer(int cpu) +static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t prot) { - struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; + phys_addr_t pa; + size_t msz = 0; + + pa = virt_to_phys(addr); + for (; msz < size; msz += PAGE_SIZE, pa += PAGE_SIZE, cea += PAGE_SIZE) + cea_set_pte(cea, pa, prot); +} + +static void ds_clear_cea(void *cea, size_t size) +{ + size_t msz = 0; + + for (; msz < size; msz += PAGE_SIZE, cea += PAGE_SIZE) + cea_set_pte(cea, 0, PAGE_NONE); +} + +static void *dsalloc_pages(size_t size, gfp_t flags, int cpu) +{ + unsigned int order = get_order(size); int node = cpu_to_node(cpu); - int max; - void *buffer, *ibuffer; + struct page *page; + + page = __alloc_pages_node(node, flags | __GFP_ZERO, order); + return page ? page_address(page) : NULL; +} + +static void dsfree_pages(const void *buffer, size_t size) +{ + if (buffer) + free_pages((unsigned long)buffer, get_order(size)); +} + +static int alloc_pebs_buffer(int cpu) +{ + struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu); + struct debug_store *ds = hwev->ds; + size_t bsiz = x86_pmu.pebs_buffer_size; + int max, node = cpu_to_node(cpu); + void *buffer, *ibuffer, *cea; if (!x86_pmu.pebs) return 0; - buffer = kzalloc_node(x86_pmu.pebs_buffer_size, GFP_KERNEL, node); + buffer = dsalloc_pages(bsiz, GFP_KERNEL, cpu); if (unlikely(!buffer)) return -ENOMEM; @@ -301,25 +337,27 @@ static int alloc_pebs_buffer(int cpu) if (x86_pmu.intel_cap.pebs_format < 2) { ibuffer = kzalloc_node(PEBS_FIXUP_SIZE, GFP_KERNEL, node); if (!ibuffer) { - kfree(buffer); + dsfree_pages(buffer, bsiz); return -ENOMEM; } per_cpu(insn_buffer, cpu) = ibuffer; } - - max = x86_pmu.pebs_buffer_size / x86_pmu.pebs_record_size; - - ds->pebs_buffer_base = (u64)(unsigned long)buffer; + hwev->ds_pebs_vaddr = buffer; + /* Update the cpu entry area mapping */ + cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer; + ds->pebs_buffer_base = (unsigned long) cea; + ds_update_cea(cea, buffer, bsiz, PAGE_KERNEL); ds->pebs_index = ds->pebs_buffer_base; - ds->pebs_absolute_maximum = ds->pebs_buffer_base + - max * x86_pmu.pebs_record_size; - + max = x86_pmu.pebs_record_size * (bsiz / x86_pmu.pebs_record_size); + ds->pebs_absolute_maximum = ds->pebs_buffer_base + max; return 0; } static void release_pebs_buffer(int cpu) { - struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; + struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu); + struct debug_store *ds = hwev->ds; + void *cea; if (!ds || !x86_pmu.pebs) return; @@ -327,73 +365,70 @@ static void release_pebs_buffer(int cpu) kfree(per_cpu(insn_buffer, cpu)); per_cpu(insn_buffer, cpu) = NULL; - kfree((void *)(unsigned long)ds->pebs_buffer_base); + /* Clear the fixmap */ + cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.pebs_buffer; + ds_clear_cea(cea, x86_pmu.pebs_buffer_size); ds->pebs_buffer_base = 0; + dsfree_pages(hwev->ds_pebs_vaddr, x86_pmu.pebs_buffer_size); + hwev->ds_pebs_vaddr = NULL; } static int alloc_bts_buffer(int cpu) { - struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; - int node = cpu_to_node(cpu); - int max, thresh; - void *buffer; + struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu); + struct debug_store *ds = hwev->ds; + void *buffer, *cea; + int max; if (!x86_pmu.bts) return 0; - buffer = kzalloc_node(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, node); + buffer = dsalloc_pages(BTS_BUFFER_SIZE, GFP_KERNEL | __GFP_NOWARN, cpu); if (unlikely(!buffer)) { WARN_ONCE(1, "%s: BTS buffer allocation failure\n", __func__); return -ENOMEM; } - - max = BTS_BUFFER_SIZE / BTS_RECORD_SIZE; - thresh = max / 16; - - ds->bts_buffer_base = (u64)(unsigned long)buffer; + hwev->ds_bts_vaddr = buffer; + /* Update the fixmap */ + cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.bts_buffer; + ds->bts_buffer_base = (unsigned long) cea; + ds_update_cea(cea, buffer, BTS_BUFFER_SIZE, PAGE_KERNEL); ds->bts_index = ds->bts_buffer_base; - ds->bts_absolute_maximum = ds->bts_buffer_base + - max * BTS_RECORD_SIZE; - ds->bts_interrupt_threshold = ds->bts_absolute_maximum - - thresh * BTS_RECORD_SIZE; - + max = BTS_RECORD_SIZE * (BTS_BUFFER_SIZE / BTS_RECORD_SIZE); + ds->bts_absolute_maximum = ds->bts_buffer_base + max; + ds->bts_interrupt_threshold = ds->bts_absolute_maximum - (max / 16); return 0; } static void release_bts_buffer(int cpu) { - struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; + struct cpu_hw_events *hwev = per_cpu_ptr(&cpu_hw_events, cpu); + struct debug_store *ds = hwev->ds; + void *cea; if (!ds || !x86_pmu.bts) return; - kfree((void *)(unsigned long)ds->bts_buffer_base); + /* Clear the fixmap */ + cea = &get_cpu_entry_area(cpu)->cpu_debug_buffers.bts_buffer; + ds_clear_cea(cea, BTS_BUFFER_SIZE); ds->bts_buffer_base = 0; + dsfree_pages(hwev->ds_bts_vaddr, BTS_BUFFER_SIZE); + hwev->ds_bts_vaddr = NULL; } static int alloc_ds_buffer(int cpu) { - int node = cpu_to_node(cpu); - struct debug_store *ds; - - ds = kzalloc_node(sizeof(*ds), GFP_KERNEL, node); - if (unlikely(!ds)) - return -ENOMEM; + struct debug_store *ds = &get_cpu_entry_area(cpu)->cpu_debug_store; + memset(ds, 0, sizeof(*ds)); per_cpu(cpu_hw_events, cpu).ds = ds; - return 0; } static void release_ds_buffer(int cpu) { - struct debug_store *ds = per_cpu(cpu_hw_events, cpu).ds; - - if (!ds) - return; - per_cpu(cpu_hw_events, cpu).ds = NULL; - kfree(ds); } void release_ds_buffers(void) diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 373f9eda80b1..8e4ea143ed96 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -199,6 +199,8 @@ struct cpu_hw_events { * Intel DebugStore bits */ struct debug_store *ds; + void *ds_pebs_vaddr; + void *ds_bts_vaddr; u64 pebs_enabled; int n_pebs; int n_large_pebs; From c125107490107fc4ce6bad0d9b45fd5e33bfa3f3 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 12 Dec 2017 07:56:44 -0800 Subject: [PATCH 0826/1013] x86/mm/64: Make a full PGD-entry size hole in the memory map commit 9f449772a3106bcdd4eb8fdeb281147b0e99fb30 upstream. Shrink vmalloc space from 16384TiB to 12800TiB to enlarge the hole starting at 0xff90000000000000 to be a full PGD entry. A subsequent patch will use this hole for the pagetable isolation LDT alias. Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 4 ++-- arch/x86/include/asm/pgtable_64_types.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 51101708a03a..496a1dbf139d 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -29,8 +29,8 @@ Virtual memory map with 5 level page tables: hole caused by [56:63] sign extension ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory -ff90000000000000 - ff91ffffffffffff (=49 bits) hole -ff92000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space +ff90000000000000 - ff9fffffffffffff (=52 bits) hole +ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB) ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) ... unused hole ... diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 3d27831bc58d..83e9489ae944 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -79,8 +79,8 @@ typedef struct { pteval_t pte; } pte_t; #define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL) #ifdef CONFIG_X86_5LEVEL -# define VMALLOC_SIZE_TB _AC(16384, UL) -# define __VMALLOC_BASE _AC(0xff92000000000000, UL) +# define VMALLOC_SIZE_TB _AC(12800, UL) +# define __VMALLOC_BASE _AC(0xffa0000000000000, UL) # define __VMEMMAP_BASE _AC(0xffd4000000000000, UL) #else # define VMALLOC_SIZE_TB _AC(32, UL) From 7aef823ee7e9a74de0d2665d116bde557aa134ca Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 12 Dec 2017 07:56:45 -0800 Subject: [PATCH 0827/1013] x86/pti: Put the LDT in its own PGD if PTI is on commit f55f0501cbf65ec41cca5058513031b711730b1d upstream. With PTI enabled, the LDT must be mapped in the usermode tables somewhere. The LDT is per process, i.e. per mm. An earlier approach mapped the LDT on context switch into a fixmap area, but that's a big overhead and exhausted the fixmap space when NR_CPUS got big. Take advantage of the fact that there is an address space hole which provides a completely unused pgd. Use this pgd to manage per-mm LDT mappings. This has a down side: the LDT isn't (currently) randomized, and an attack that can write the LDT is instant root due to call gates (thanks, AMD, for leaving call gates in AMD64 but designing them wrong so they're only useful for exploits). This can be mitigated by making the LDT read-only or randomizing the mapping, either of which is strightforward on top of this patch. This will significantly slow down LDT users, but that shouldn't matter for important workloads -- the LDT is only used by DOSEMU(2), Wine, and very old libc implementations. [ tglx: Cleaned it up. ] Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 3 +- arch/x86/include/asm/mmu_context.h | 59 +++++++++- arch/x86/include/asm/pgtable_64_types.h | 4 + arch/x86/include/asm/processor.h | 23 ++-- arch/x86/kernel/ldt.c | 139 +++++++++++++++++++++++- arch/x86/mm/dump_pagetables.c | 9 ++ 6 files changed, 220 insertions(+), 17 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index 496a1dbf139d..ad41b3813f0a 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -12,6 +12,7 @@ ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) ... unused hole ... ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB) ... unused hole ... +fffffe0000000000 - fffffe7fffffffff (=39 bits) LDT remap for PTI fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... @@ -29,7 +30,7 @@ Virtual memory map with 5 level page tables: hole caused by [56:63] sign extension ff00000000000000 - ff0fffffffffffff (=52 bits) guard hole, reserved for hypervisor ff10000000000000 - ff8fffffffffffff (=55 bits) direct mapping of all phys. memory -ff90000000000000 - ff9fffffffffffff (=52 bits) hole +ff90000000000000 - ff9fffffffffffff (=52 bits) LDT remap for PTI ffa0000000000000 - ffd1ffffffffffff (=54 bits) vmalloc/ioremap space (12800 TB) ffd2000000000000 - ffd3ffffffffffff (=49 bits) hole ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h index 5ede7cae1d67..c931b88982a0 100644 --- a/arch/x86/include/asm/mmu_context.h +++ b/arch/x86/include/asm/mmu_context.h @@ -50,10 +50,33 @@ struct ldt_struct { * call gates. On native, we could merge the ldt_struct and LDT * allocations, but it's not worth trying to optimize. */ - struct desc_struct *entries; - unsigned int nr_entries; + struct desc_struct *entries; + unsigned int nr_entries; + + /* + * If PTI is in use, then the entries array is not mapped while we're + * in user mode. The whole array will be aliased at the addressed + * given by ldt_slot_va(slot). We use two slots so that we can allocate + * and map, and enable a new LDT without invalidating the mapping + * of an older, still-in-use LDT. + * + * slot will be -1 if this LDT doesn't have an alias mapping. + */ + int slot; }; +/* This is a multiple of PAGE_SIZE. */ +#define LDT_SLOT_STRIDE (LDT_ENTRIES * LDT_ENTRY_SIZE) + +static inline void *ldt_slot_va(int slot) +{ +#ifdef CONFIG_X86_64 + return (void *)(LDT_BASE_ADDR + LDT_SLOT_STRIDE * slot); +#else + BUG(); +#endif +} + /* * Used for LDT copy/destruction. */ @@ -64,6 +87,7 @@ static inline void init_new_context_ldt(struct mm_struct *mm) } int ldt_dup_context(struct mm_struct *oldmm, struct mm_struct *mm); void destroy_context_ldt(struct mm_struct *mm); +void ldt_arch_exit_mmap(struct mm_struct *mm); #else /* CONFIG_MODIFY_LDT_SYSCALL */ static inline void init_new_context_ldt(struct mm_struct *mm) { } static inline int ldt_dup_context(struct mm_struct *oldmm, @@ -71,7 +95,8 @@ static inline int ldt_dup_context(struct mm_struct *oldmm, { return 0; } -static inline void destroy_context_ldt(struct mm_struct *mm) {} +static inline void destroy_context_ldt(struct mm_struct *mm) { } +static inline void ldt_arch_exit_mmap(struct mm_struct *mm) { } #endif static inline void load_mm_ldt(struct mm_struct *mm) @@ -96,10 +121,31 @@ static inline void load_mm_ldt(struct mm_struct *mm) * that we can see. */ - if (unlikely(ldt)) - set_ldt(ldt->entries, ldt->nr_entries); - else + if (unlikely(ldt)) { + if (static_cpu_has(X86_FEATURE_PTI)) { + if (WARN_ON_ONCE((unsigned long)ldt->slot > 1)) { + /* + * Whoops -- either the new LDT isn't mapped + * (if slot == -1) or is mapped into a bogus + * slot (if slot > 1). + */ + clear_LDT(); + return; + } + + /* + * If page table isolation is enabled, ldt->entries + * will not be mapped in the userspace pagetables. + * Tell the CPU to access the LDT through the alias + * at ldt_slot_va(ldt->slot). + */ + set_ldt(ldt_slot_va(ldt->slot), ldt->nr_entries); + } else { + set_ldt(ldt->entries, ldt->nr_entries); + } + } else { clear_LDT(); + } #else clear_LDT(); #endif @@ -194,6 +240,7 @@ static inline int arch_dup_mmap(struct mm_struct *oldmm, struct mm_struct *mm) static inline void arch_exit_mmap(struct mm_struct *mm) { paravirt_arch_exit_mmap(mm); + ldt_arch_exit_mmap(mm); } #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 83e9489ae944..b97a539bcdee 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -82,10 +82,14 @@ typedef struct { pteval_t pte; } pte_t; # define VMALLOC_SIZE_TB _AC(12800, UL) # define __VMALLOC_BASE _AC(0xffa0000000000000, UL) # define __VMEMMAP_BASE _AC(0xffd4000000000000, UL) +# define LDT_PGD_ENTRY _AC(-112, UL) +# define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) #else # define VMALLOC_SIZE_TB _AC(32, UL) # define __VMALLOC_BASE _AC(0xffffc90000000000, UL) # define __VMEMMAP_BASE _AC(0xffffea0000000000, UL) +# define LDT_PGD_ENTRY _AC(-4, UL) +# define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) #endif #ifdef CONFIG_RANDOMIZE_MEMORY diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 9e482d8b0b97..9c18da64daa9 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -851,13 +851,22 @@ static inline void spin_lock_prefetch(const void *x) #else /* - * User space process size. 47bits minus one guard page. The guard - * page is necessary on Intel CPUs: if a SYSCALL instruction is at - * the highest possible canonical userspace address, then that - * syscall will enter the kernel with a non-canonical return - * address, and SYSRET will explode dangerously. We avoid this - * particular problem by preventing anything from being mapped - * at the maximum canonical address. + * User space process size. This is the first address outside the user range. + * There are a few constraints that determine this: + * + * On Intel CPUs, if a SYSCALL instruction is at the highest canonical + * address, then that syscall will enter the kernel with a + * non-canonical return address, and SYSRET will explode dangerously. + * We avoid this particular problem by preventing anything executable + * from being mapped at the maximum canonical address. + * + * On AMD CPUs in the Ryzen family, there's a nasty bug in which the + * CPUs malfunction if they execute code from the highest canonical page. + * They'll speculate right off the end of the canonical space, and + * bad things happen. This is worked around in the same way as the + * Intel problem. + * + * With page table isolation enabled, we map the LDT in ... [stay tuned] */ #define TASK_SIZE_MAX ((1UL << __VIRTUAL_MASK_SHIFT) - PAGE_SIZE) diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index a6b5d62f45a7..9629c5d8267a 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -24,6 +24,7 @@ #include #include +#include #include #include #include @@ -51,13 +52,11 @@ static void refresh_ldt_segments(void) static void flush_ldt(void *__mm) { struct mm_struct *mm = __mm; - mm_context_t *pc; if (this_cpu_read(cpu_tlbstate.loaded_mm) != mm) return; - pc = &mm->context; - set_ldt(pc->ldt->entries, pc->ldt->nr_entries); + load_mm_ldt(mm); refresh_ldt_segments(); } @@ -94,10 +93,121 @@ static struct ldt_struct *alloc_ldt_struct(unsigned int num_entries) return NULL; } + /* The new LDT isn't aliased for PTI yet. */ + new_ldt->slot = -1; + new_ldt->nr_entries = num_entries; return new_ldt; } +/* + * If PTI is enabled, this maps the LDT into the kernelmode and + * usermode tables for the given mm. + * + * There is no corresponding unmap function. Even if the LDT is freed, we + * leave the PTEs around until the slot is reused or the mm is destroyed. + * This is harmless: the LDT is always in ordinary memory, and no one will + * access the freed slot. + * + * If we wanted to unmap freed LDTs, we'd also need to do a flush to make + * it useful, and the flush would slow down modify_ldt(). + */ +static int +map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot) +{ +#ifdef CONFIG_PAGE_TABLE_ISOLATION + bool is_vmalloc, had_top_level_entry; + unsigned long va; + spinlock_t *ptl; + pgd_t *pgd; + int i; + + if (!static_cpu_has(X86_FEATURE_PTI)) + return 0; + + /* + * Any given ldt_struct should have map_ldt_struct() called at most + * once. + */ + WARN_ON(ldt->slot != -1); + + /* + * Did we already have the top level entry allocated? We can't + * use pgd_none() for this because it doens't do anything on + * 4-level page table kernels. + */ + pgd = pgd_offset(mm, LDT_BASE_ADDR); + had_top_level_entry = (pgd->pgd != 0); + + is_vmalloc = is_vmalloc_addr(ldt->entries); + + for (i = 0; i * PAGE_SIZE < ldt->nr_entries * LDT_ENTRY_SIZE; i++) { + unsigned long offset = i << PAGE_SHIFT; + const void *src = (char *)ldt->entries + offset; + unsigned long pfn; + pte_t pte, *ptep; + + va = (unsigned long)ldt_slot_va(slot) + offset; + pfn = is_vmalloc ? vmalloc_to_pfn(src) : + page_to_pfn(virt_to_page(src)); + /* + * Treat the PTI LDT range as a *userspace* range. + * get_locked_pte() will allocate all needed pagetables + * and account for them in this mm. + */ + ptep = get_locked_pte(mm, va, &ptl); + if (!ptep) + return -ENOMEM; + pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL & ~_PAGE_GLOBAL)); + set_pte_at(mm, va, ptep, pte); + pte_unmap_unlock(ptep, ptl); + } + + if (mm->context.ldt) { + /* + * We already had an LDT. The top-level entry should already + * have been allocated and synchronized with the usermode + * tables. + */ + WARN_ON(!had_top_level_entry); + if (static_cpu_has(X86_FEATURE_PTI)) + WARN_ON(!kernel_to_user_pgdp(pgd)->pgd); + } else { + /* + * This is the first time we're mapping an LDT for this process. + * Sync the pgd to the usermode tables. + */ + WARN_ON(had_top_level_entry); + if (static_cpu_has(X86_FEATURE_PTI)) { + WARN_ON(kernel_to_user_pgdp(pgd)->pgd); + set_pgd(kernel_to_user_pgdp(pgd), *pgd); + } + } + + va = (unsigned long)ldt_slot_va(slot); + flush_tlb_mm_range(mm, va, va + LDT_SLOT_STRIDE, 0); + + ldt->slot = slot; +#endif + return 0; +} + +static void free_ldt_pgtables(struct mm_struct *mm) +{ +#ifdef CONFIG_PAGE_TABLE_ISOLATION + struct mmu_gather tlb; + unsigned long start = LDT_BASE_ADDR; + unsigned long end = start + (1UL << PGDIR_SHIFT); + + if (!static_cpu_has(X86_FEATURE_PTI)) + return; + + tlb_gather_mmu(&tlb, mm, start, end); + free_pgd_range(&tlb, start, end, start, end); + tlb_finish_mmu(&tlb, start, end); +#endif +} + /* After calling this, the LDT is immutable. */ static void finalize_ldt_struct(struct ldt_struct *ldt) { @@ -156,6 +266,12 @@ int ldt_dup_context(struct mm_struct *old_mm, struct mm_struct *mm) new_ldt->nr_entries * LDT_ENTRY_SIZE); finalize_ldt_struct(new_ldt); + retval = map_ldt_struct(mm, new_ldt, 0); + if (retval) { + free_ldt_pgtables(mm); + free_ldt_struct(new_ldt); + goto out_unlock; + } mm->context.ldt = new_ldt; out_unlock: @@ -174,6 +290,11 @@ void destroy_context_ldt(struct mm_struct *mm) mm->context.ldt = NULL; } +void ldt_arch_exit_mmap(struct mm_struct *mm) +{ + free_ldt_pgtables(mm); +} + static int read_ldt(void __user *ptr, unsigned long bytecount) { struct mm_struct *mm = current->mm; @@ -287,6 +408,18 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) new_ldt->entries[ldt_info.entry_number] = ldt; finalize_ldt_struct(new_ldt); + /* + * If we are using PTI, map the new LDT into the userspace pagetables. + * If there is already an LDT, use the other slot so that other CPUs + * will continue to use the old LDT until install_ldt() switches + * them over to the new LDT. + */ + error = map_ldt_struct(mm, new_ldt, old_ldt ? !old_ldt->slot : 0); + if (error) { + free_ldt_struct(old_ldt); + goto out_unlock; + } + install_ldt(mm, new_ldt); free_ldt_struct(old_ldt); error = 0; diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 43dedbfb7257..690eaf31ca34 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -52,11 +52,17 @@ enum address_markers_idx { USER_SPACE_NR = 0, KERNEL_SPACE_NR, LOW_KERNEL_NR, +#if defined(CONFIG_MODIFY_LDT_SYSCALL) && defined(CONFIG_X86_5LEVEL) + LDT_NR, +#endif VMALLOC_START_NR, VMEMMAP_START_NR, #ifdef CONFIG_KASAN KASAN_SHADOW_START_NR, KASAN_SHADOW_END_NR, +#endif +#if defined(CONFIG_MODIFY_LDT_SYSCALL) && !defined(CONFIG_X86_5LEVEL) + LDT_NR, #endif CPU_ENTRY_AREA_NR, #ifdef CONFIG_X86_ESPFIX64 @@ -81,6 +87,9 @@ static struct addr_marker address_markers[] = { #ifdef CONFIG_KASAN [KASAN_SHADOW_START_NR] = { KASAN_SHADOW_START, "KASAN shadow" }, [KASAN_SHADOW_END_NR] = { KASAN_SHADOW_END, "KASAN shadow end" }, +#endif +#ifdef CONFIG_MODIFY_LDT_SYSCALL + [LDT_NR] = { LDT_BASE_ADDR, "LDT remap" }, #endif [CPU_ENTRY_AREA_NR] = { CPU_ENTRY_AREA_BASE,"CPU entry Area" }, #ifdef CONFIG_X86_ESPFIX64 From 9617ee896217558a2488b6dc71968a4489fb18b6 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 12 Dec 2017 07:56:42 -0800 Subject: [PATCH 0828/1013] x86/pti: Map the vsyscall page if needed commit 85900ea51577e31b186e523c8f4e068c79ecc7d3 upstream. Make VSYSCALLs work fully in PTI mode by mapping them properly to the user space visible page tables. [ tglx: Hide unused functions (Patch by Arnd Bergmann) ] Signed-off-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Kees Cook Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/vsyscall/vsyscall_64.c | 6 +-- arch/x86/include/asm/vsyscall.h | 1 + arch/x86/mm/pti.c | 65 +++++++++++++++++++++++++++ 3 files changed, 69 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/vsyscall/vsyscall_64.c b/arch/x86/entry/vsyscall/vsyscall_64.c index 1faf40f2dda9..577fa8adb785 100644 --- a/arch/x86/entry/vsyscall/vsyscall_64.c +++ b/arch/x86/entry/vsyscall/vsyscall_64.c @@ -344,14 +344,14 @@ int in_gate_area_no_mm(unsigned long addr) * vsyscalls but leave the page not present. If so, we skip calling * this. */ -static void __init set_vsyscall_pgtable_user_bits(void) +void __init set_vsyscall_pgtable_user_bits(pgd_t *root) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd; - pgd = pgd_offset_k(VSYSCALL_ADDR); + pgd = pgd_offset_pgd(root, VSYSCALL_ADDR); set_pgd(pgd, __pgd(pgd_val(*pgd) | _PAGE_USER)); p4d = p4d_offset(pgd, VSYSCALL_ADDR); #if CONFIG_PGTABLE_LEVELS >= 5 @@ -373,7 +373,7 @@ void __init map_vsyscall(void) vsyscall_mode == NATIVE ? PAGE_KERNEL_VSYSCALL : PAGE_KERNEL_VVAR); - set_vsyscall_pgtable_user_bits(); + set_vsyscall_pgtable_user_bits(swapper_pg_dir); } BUILD_BUG_ON((unsigned long)__fix_to_virt(VSYSCALL_PAGE) != diff --git a/arch/x86/include/asm/vsyscall.h b/arch/x86/include/asm/vsyscall.h index d9a7c659009c..b986b2ca688a 100644 --- a/arch/x86/include/asm/vsyscall.h +++ b/arch/x86/include/asm/vsyscall.h @@ -7,6 +7,7 @@ #ifdef CONFIG_X86_VSYSCALL_EMULATION extern void map_vsyscall(void); +extern void set_vsyscall_pgtable_user_bits(pgd_t *root); /* * Called on instruction fetch fault in vsyscall page. diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index b1c38ef9fbbb..bce8aea65606 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -38,6 +38,7 @@ #include #include +#include #include #include #include @@ -223,6 +224,69 @@ static pmd_t *pti_user_pagetable_walk_pmd(unsigned long address) return pmd_offset(pud, address); } +#ifdef CONFIG_X86_VSYSCALL_EMULATION +/* + * Walk the shadow copy of the page tables (optionally) trying to allocate + * page table pages on the way down. Does not support large pages. + * + * Note: this is only used when mapping *new* kernel data into the + * user/shadow page tables. It is never used for userspace data. + * + * Returns a pointer to a PTE on success, or NULL on failure. + */ +static __init pte_t *pti_user_pagetable_walk_pte(unsigned long address) +{ + gfp_t gfp = (GFP_KERNEL | __GFP_NOTRACK | __GFP_ZERO); + pmd_t *pmd = pti_user_pagetable_walk_pmd(address); + pte_t *pte; + + /* We can't do anything sensible if we hit a large mapping. */ + if (pmd_large(*pmd)) { + WARN_ON(1); + return NULL; + } + + if (pmd_none(*pmd)) { + unsigned long new_pte_page = __get_free_page(gfp); + if (!new_pte_page) + return NULL; + + if (pmd_none(*pmd)) { + set_pmd(pmd, __pmd(_KERNPG_TABLE | __pa(new_pte_page))); + new_pte_page = 0; + } + if (new_pte_page) + free_page(new_pte_page); + } + + pte = pte_offset_kernel(pmd, address); + if (pte_flags(*pte) & _PAGE_USER) { + WARN_ONCE(1, "attempt to walk to user pte\n"); + return NULL; + } + return pte; +} + +static void __init pti_setup_vsyscall(void) +{ + pte_t *pte, *target_pte; + unsigned int level; + + pte = lookup_address(VSYSCALL_ADDR, &level); + if (!pte || WARN_ON(level != PG_LEVEL_4K) || pte_none(*pte)) + return; + + target_pte = pti_user_pagetable_walk_pte(VSYSCALL_ADDR); + if (WARN_ON(!target_pte)) + return; + + *target_pte = *pte; + set_vsyscall_pgtable_user_bits(kernel_to_user_pgdp(swapper_pg_dir)); +} +#else +static void __init pti_setup_vsyscall(void) { } +#endif + static void __init pti_clone_pmds(unsigned long start, unsigned long end, pmdval_t clear) { @@ -319,4 +383,5 @@ void __init pti_init(void) pti_clone_user_shared(); pti_clone_entry_text(); pti_setup_espfix64(); + pti_setup_vsyscall(); } From c796e2324094f098575e47ec6d19f22cc4a4f9b9 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:57 +0100 Subject: [PATCH 0829/1013] x86/mm: Allow flushing for future ASID switches commit 2ea907c4fe7b78e5840c1dc07800eae93248cad1 upstream. If changing the page tables in such a way that an invalidation of all contexts (aka. PCIDs / ASIDs) is required, they can be actively invalidated by: 1. INVPCID for each PCID (works for single pages too). 2. Load CR3 with each PCID without the NOFLUSH bit set 3. Load CR3 with the NOFLUSH bit set for each and do INVLPG for each address. But, none of these are really feasible since there are ~6 ASIDs (12 with PAGE_TABLE_ISOLATION) at the time that invalidation is required. Instead of actively invalidating them, invalidate the *current* context and also mark the cpu_tlbstate _quickly_ to indicate future invalidation to be required. At the next context-switch, look for this indicator ('invalidate_other' being set) invalidate all of the cpu_tlbstate.ctxs[] entries. This ensures that any future context switches will do a full flush of the TLB, picking up the previous changes. [ tglx: Folded more fixups from Peter ] Signed-off-by: Dave Hansen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/tlbflush.h | 37 ++++++++++++++++++++++++++------- arch/x86/mm/tlb.c | 35 +++++++++++++++++++++++++++++++ 2 files changed, 64 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 171b429f43a2..490a706fdba8 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -134,6 +134,17 @@ struct tlb_state { */ bool is_lazy; + /* + * If set we changed the page tables in such a way that we + * needed an invalidation of all contexts (aka. PCIDs / ASIDs). + * This tells us to go invalidate all the non-loaded ctxs[] + * on the next context switch. + * + * The current ctx was kept up-to-date as it ran and does not + * need to be invalidated. + */ + bool invalidate_other; + /* * Access to this CR4 shadow and to H/W CR4 is protected by * disabling interrupts when modifying either one. @@ -211,6 +222,14 @@ static inline unsigned long cr4_read_shadow(void) return this_cpu_read(cpu_tlbstate.cr4); } +/* + * Mark all other ASIDs as invalid, preserves the current. + */ +static inline void invalidate_other_asid(void) +{ + this_cpu_write(cpu_tlbstate.invalidate_other, true); +} + /* * Save some of cr4 feature set we're using (e.g. Pentium 4MB * enable and PPro Global page enable), so that any CPU's that boot @@ -298,14 +317,6 @@ static inline void __flush_tlb_all(void) */ __flush_tlb(); } - - /* - * Note: if we somehow had PCID but not PGE, then this wouldn't work -- - * we'd end up flushing kernel translations for the current ASID but - * we might fail to flush kernel translations for other cached ASIDs. - * - * To avoid this issue, we force PCID off if PGE is off. - */ } /* @@ -315,6 +326,16 @@ static inline void __flush_tlb_one(unsigned long addr) { count_vm_tlb_event(NR_TLB_LOCAL_FLUSH_ONE); __flush_tlb_single(addr); + + if (!static_cpu_has(X86_FEATURE_PTI)) + return; + + /* + * __flush_tlb_single() will have cleared the TLB entry for this ASID, + * but since kernel space is replicated across all, we must also + * invalidate all others. + */ + invalidate_other_asid(); } #define TLB_FLUSH_ALL -1UL diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 0a1be3adc97e..254c9eb79fe5 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -28,6 +28,38 @@ * Implement flush IPI by CALL_FUNCTION_VECTOR, Alex Shi */ +/* + * We get here when we do something requiring a TLB invalidation + * but could not go invalidate all of the contexts. We do the + * necessary invalidation by clearing out the 'ctx_id' which + * forces a TLB flush when the context is loaded. + */ +void clear_asid_other(void) +{ + u16 asid; + + /* + * This is only expected to be set if we have disabled + * kernel _PAGE_GLOBAL pages. + */ + if (!static_cpu_has(X86_FEATURE_PTI)) { + WARN_ON_ONCE(1); + return; + } + + for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) { + /* Do not need to flush the current asid */ + if (asid == this_cpu_read(cpu_tlbstate.loaded_mm_asid)) + continue; + /* + * Make sure the next time we go to switch to + * this asid, we do a flush: + */ + this_cpu_write(cpu_tlbstate.ctxs[asid].ctx_id, 0); + } + this_cpu_write(cpu_tlbstate.invalidate_other, false); +} + atomic64_t last_mm_ctx_id = ATOMIC64_INIT(1); @@ -42,6 +74,9 @@ static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, return; } + if (this_cpu_read(cpu_tlbstate.invalidate_other)) + clear_asid_other(); + for (asid = 0; asid < TLB_NR_DYN_ASIDS; asid++) { if (this_cpu_read(cpu_tlbstate.ctxs[asid].ctx_id) != next->context.ctx_id) From 954339c41cceca991c0aec6dd3b7e164c5b9f48b Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:07:58 +0100 Subject: [PATCH 0830/1013] x86/mm: Abstract switching CR3 commit 48e111982cda033fec832c6b0592c2acedd85d04 upstream. In preparation to adding additional PCID flushing, abstract the loading of a new ASID into CR3. [ PeterZ: Split out from big combo patch ] Signed-off-by: Dave Hansen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/tlb.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 254c9eb79fe5..42a8875f73fe 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -100,6 +100,24 @@ static void choose_new_asid(struct mm_struct *next, u64 next_tlb_gen, *need_flush = true; } +static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush) +{ + unsigned long new_mm_cr3; + + if (need_flush) { + new_mm_cr3 = build_cr3(pgdir, new_asid); + } else { + new_mm_cr3 = build_cr3_noflush(pgdir, new_asid); + } + + /* + * Caution: many callers of this function expect + * that load_cr3() is serializing and orders TLB + * fills with respect to the mm_cpumask writes. + */ + write_cr3(new_mm_cr3); +} + void leave_mm(int cpu) { struct mm_struct *loaded_mm = this_cpu_read(cpu_tlbstate.loaded_mm); @@ -230,7 +248,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, if (need_flush) { this_cpu_write(cpu_tlbstate.ctxs[new_asid].ctx_id, next->context.ctx_id); this_cpu_write(cpu_tlbstate.ctxs[new_asid].tlb_gen, next_tlb_gen); - write_cr3(build_cr3(next->pgd, new_asid)); + load_new_mm_cr3(next->pgd, new_asid, true); /* * NB: This gets called via leave_mm() in the idle path @@ -243,7 +261,7 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next, trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, TLB_FLUSH_ALL); } else { /* The new ASID is already up to date. */ - write_cr3(build_cr3_noflush(next->pgd, new_asid)); + load_new_mm_cr3(next->pgd, new_asid, false); /* See above wrt _rcuidle. */ trace_tlb_flush_rcuidle(TLB_FLUSH_ON_TASK_SWITCH, 0); From b63812b81349e5d1a35107e2464547187bc25a61 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 4 Dec 2017 15:07:59 +0100 Subject: [PATCH 0831/1013] x86/mm: Use/Fix PCID to optimize user/kernel switches commit 6fd166aae78c0ab738d49bda653cbd9e3b1491cf upstream. We can use PCID to retain the TLBs across CR3 switches; including those now part of the user/kernel switch. This increases performance of kernel entry/exit at the cost of more expensive/complicated TLB flushing. Now that we have two address spaces, one for kernel and one for user space, we need two PCIDs per mm. We use the top PCID bit to indicate a user PCID (just like we use the PFN LSB for the PGD). Since we do TLB invalidation from kernel space, the existing code will only invalidate the kernel PCID, we augment that by marking the corresponding user PCID invalid, and upon switching back to userspace, use a flushing CR3 write for the switch. In order to access the user_pcid_flush_mask we use PER_CPU storage, which means the previously established SWAPGS vs CR3 ordering is now mandatory and required. Having to do this memory access does require additional registers, most sites have a functioning stack and we can spill one (RAX), sites without functional stack need to otherwise provide the second scratch register. Note: PCID is generally available on Intel Sandybridge and later CPUs. Note: Up until this point TLB flushing was broken in this series. Based-on-code-from: Dave Hansen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/calling.h | 72 +++++++++++++--- arch/x86/entry/entry_64.S | 9 +- arch/x86/entry/entry_64_compat.S | 4 +- arch/x86/include/asm/processor-flags.h | 5 ++ arch/x86/include/asm/tlbflush.h | 91 ++++++++++++++++++--- arch/x86/include/uapi/asm/processor-flags.h | 7 +- arch/x86/kernel/asm-offsets.c | 4 + arch/x86/mm/init.c | 2 +- arch/x86/mm/tlb.c | 1 + 9 files changed, 162 insertions(+), 33 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 3d3389a92c33..7894e5c0eef7 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -3,6 +3,9 @@ #include #include #include +#include +#include +#include /* @@ -191,17 +194,21 @@ For 32-bit we have the following conventions - kernel is built with #ifdef CONFIG_PAGE_TABLE_ISOLATION -/* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two halves: */ -#define PTI_SWITCH_MASK (1< #include #include -#include "calling.h" #include #include #include @@ -40,6 +39,8 @@ #include #include +#include "calling.h" + .code64 .section .entry.text, "ax" @@ -406,7 +407,7 @@ syscall_return_via_sysret: * We are on the trampoline stack. All regs except RDI are live. * We can do future final exit work right here. */ - SWITCH_TO_USER_CR3 scratch_reg=%rdi + SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi popq %rdi popq %rsp @@ -744,7 +745,7 @@ GLOBAL(swapgs_restore_regs_and_return_to_usermode) * We can do future final exit work right here. */ - SWITCH_TO_USER_CR3 scratch_reg=%rdi + SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi /* Restore RDI. */ popq %rdi @@ -857,7 +858,7 @@ native_irq_return_ldt: */ orq PER_CPU_VAR(espfix_stack), %rax - SWITCH_TO_USER_CR3 scratch_reg=%rdi /* to user CR3 */ + SWITCH_TO_USER_CR3_STACK scratch_reg=%rdi SWAPGS /* to user GS */ popq %rdi /* Restore user RDI */ diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 05238b29895e..40f17009ec20 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -275,9 +275,9 @@ sysret32_from_system_call: * switch until after after the last reference to the process * stack. * - * %r8 is zeroed before the sysret, thus safe to clobber. + * %r8/%r9 are zeroed before the sysret, thus safe to clobber. */ - SWITCH_TO_USER_CR3 scratch_reg=%r8 + SWITCH_TO_USER_CR3_NOSTACK scratch_reg=%r8 scratch_reg2=%r9 xorq %r8, %r8 xorq %r9, %r9 diff --git a/arch/x86/include/asm/processor-flags.h b/arch/x86/include/asm/processor-flags.h index 43212a43ee69..6a60fea90b9d 100644 --- a/arch/x86/include/asm/processor-flags.h +++ b/arch/x86/include/asm/processor-flags.h @@ -38,6 +38,11 @@ #define CR3_ADDR_MASK __sme_clr(0x7FFFFFFFFFFFF000ull) #define CR3_PCID_MASK 0xFFFull #define CR3_NOFLUSH BIT_ULL(63) + +#ifdef CONFIG_PAGE_TABLE_ISOLATION +# define X86_CR3_PTI_SWITCH_BIT 11 +#endif + #else /* * CR3_ADDR_MASK needs at least bits 31:5 set on PAE systems, and we save diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 490a706fdba8..5dcc38b16604 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -10,6 +10,8 @@ #include #include #include +#include +#include static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) { @@ -24,24 +26,54 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) /* There are 12 bits of space for ASIDS in CR3 */ #define CR3_HW_ASID_BITS 12 + /* * When enabled, PAGE_TABLE_ISOLATION consumes a single bit for * user/kernel switches */ -#define PTI_CONSUMED_ASID_BITS 0 +#ifdef CONFIG_PAGE_TABLE_ISOLATION +# define PTI_CONSUMED_PCID_BITS 1 +#else +# define PTI_CONSUMED_PCID_BITS 0 +#endif + +#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS) -#define CR3_AVAIL_ASID_BITS (CR3_HW_ASID_BITS - PTI_CONSUMED_ASID_BITS) /* * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account * for them being zero-based. Another -1 is because ASID 0 is reserved for * use by non-PCID-aware users. */ -#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_ASID_BITS) - 2) +#define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2) + +/* + * 6 because 6 should be plenty and struct tlb_state will fit in two cache + * lines. + */ +#define TLB_NR_DYN_ASIDS 6 static inline u16 kern_pcid(u16 asid) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); + +#ifdef CONFIG_PAGE_TABLE_ISOLATION + /* + * Make sure that the dynamic ASID space does not confict with the + * bit we are using to switch between user and kernel ASIDs. + */ + BUILD_BUG_ON(TLB_NR_DYN_ASIDS >= (1 << X86_CR3_PTI_SWITCH_BIT)); + /* + * The ASID being passed in here should have respected the + * MAX_ASID_AVAILABLE and thus never have the switch bit set. + */ + VM_WARN_ON_ONCE(asid & (1 << X86_CR3_PTI_SWITCH_BIT)); +#endif + /* + * The dynamically-assigned ASIDs that get passed in are small + * (mm == NULL then we borrow a mm which may change during a - * task switch and therefore we must not be preempted while we write CR3 - * back: + * If current->mm == NULL then we borrow a mm which may change + * during a task switch and therefore we must not be preempted + * while we write CR3 back: */ preempt_disable(); native_write_cr3(__native_read_cr3()); @@ -301,7 +361,14 @@ static inline void __native_flush_tlb_global(void) */ static inline void __native_flush_tlb_single(unsigned long addr) { + u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); + + if (!static_cpu_has(X86_FEATURE_PTI)) + return; + + invalidate_user_asid(loaded_mm_asid); } /* diff --git a/arch/x86/include/uapi/asm/processor-flags.h b/arch/x86/include/uapi/asm/processor-flags.h index 53b4ca55ebb6..97abdaab9535 100644 --- a/arch/x86/include/uapi/asm/processor-flags.h +++ b/arch/x86/include/uapi/asm/processor-flags.h @@ -78,7 +78,12 @@ #define X86_CR3_PWT _BITUL(X86_CR3_PWT_BIT) #define X86_CR3_PCD_BIT 4 /* Page Cache Disable */ #define X86_CR3_PCD _BITUL(X86_CR3_PCD_BIT) -#define X86_CR3_PCID_MASK _AC(0x00000fff,UL) /* PCID Mask */ + +#define X86_CR3_PCID_BITS 12 +#define X86_CR3_PCID_MASK (_AC((1UL << X86_CR3_PCID_BITS) - 1, UL)) + +#define X86_CR3_PCID_NOFLUSH_BIT 63 /* Preserve old PCID */ +#define X86_CR3_PCID_NOFLUSH _BITULL(X86_CR3_PCID_NOFLUSH_BIT) /* * Intel CPU features in CR4 diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 676b7cf4b62b..76417a9aab73 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -17,6 +17,7 @@ #include #include #include +#include #ifdef CONFIG_XEN #include @@ -94,6 +95,9 @@ void common(void) { BLANK(); DEFINE(PTREGS_SIZE, sizeof(struct pt_regs)); + /* TLB state for the entry code */ + OFFSET(TLB_STATE_user_pcid_flush_mask, tlb_state, user_pcid_flush_mask); + /* Layout info for cpu_entry_area */ OFFSET(CPU_ENTRY_AREA_tss, cpu_entry_area, tss); OFFSET(CPU_ENTRY_AREA_entry_trampoline, cpu_entry_area, entry_trampoline); diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index af75069fb116..caeb8a7bf0a4 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -855,7 +855,7 @@ void __init zone_sizes_init(void) free_area_init_nodes(max_zone_pfns); } -DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = { +__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = { .loaded_mm = &init_mm, .next_asid = 1, .cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 42a8875f73fe..a1561957dccb 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -105,6 +105,7 @@ static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush) unsigned long new_mm_cr3; if (need_flush) { + invalidate_user_asid(new_asid); new_mm_cr3 = build_cr3(pgdir, new_asid); } else { new_mm_cr3 = build_cr3_noflush(pgdir, new_asid); From 36a72ab52c8d969a7a302082f52731c1be0e9ada Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 4 Dec 2017 15:08:00 +0100 Subject: [PATCH 0832/1013] x86/mm: Optimize RESTORE_CR3 commit 21e94459110252d41b45c0c8ba50fd72a664d50c upstream. Most NMI/paranoid exceptions will not in fact change pagetables and would thus not require TLB flushing, however RESTORE_CR3 uses flushing CR3 writes. Restores to kernel PCIDs can be NOFLUSH, because we explicitly flush the kernel mappings and now that we track which user PCIDs need flushing we can avoid those too when possible. This does mean RESTORE_CR3 needs an additional scratch_reg, luckily both sites have plenty available. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/calling.h | 30 ++++++++++++++++++++++++++++-- arch/x86/entry/entry_64.S | 4 ++-- 2 files changed, 30 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h index 7894e5c0eef7..45a63e00a6af 100644 --- a/arch/x86/entry/calling.h +++ b/arch/x86/entry/calling.h @@ -281,8 +281,34 @@ For 32-bit we have the following conventions - kernel is built with .Ldone_\@: .endm -.macro RESTORE_CR3 save_reg:req +.macro RESTORE_CR3 scratch_reg:req save_reg:req ALTERNATIVE "jmp .Lend_\@", "", X86_FEATURE_PTI + + ALTERNATIVE "jmp .Lwrcr3_\@", "", X86_FEATURE_PCID + + /* + * KERNEL pages can always resume with NOFLUSH as we do + * explicit flushes. + */ + bt $X86_CR3_PTI_SWITCH_BIT, \save_reg + jnc .Lnoflush_\@ + + /* + * Check if there's a pending flush for the user ASID we're + * about to set. + */ + movq \save_reg, \scratch_reg + andq $(0x7FF), \scratch_reg + bt \scratch_reg, THIS_CPU_user_pcid_flush_mask + jnc .Lnoflush_\@ + + btr \scratch_reg, THIS_CPU_user_pcid_flush_mask + jmp .Lwrcr3_\@ + +.Lnoflush_\@: + SET_NOFLUSH_BIT \save_reg + +.Lwrcr3_\@: /* * The CR3 write could be avoided when not changing its value, * but would require a CR3 read *and* a scratch register. @@ -301,7 +327,7 @@ For 32-bit we have the following conventions - kernel is built with .endm .macro SAVE_AND_SWITCH_TO_KERNEL_CR3 scratch_reg:req save_reg:req .endm -.macro RESTORE_CR3 save_reg:req +.macro RESTORE_CR3 scratch_reg:req save_reg:req .endm #endif diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index e15d4ba19b3d..dd696b966e58 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1288,7 +1288,7 @@ ENTRY(paranoid_exit) testl %ebx, %ebx /* swapgs needed? */ jnz .Lparanoid_exit_no_swapgs TRACE_IRQS_IRETQ - RESTORE_CR3 save_reg=%r14 + RESTORE_CR3 scratch_reg=%rbx save_reg=%r14 SWAPGS_UNSAFE_STACK jmp .Lparanoid_exit_restore .Lparanoid_exit_no_swapgs: @@ -1730,7 +1730,7 @@ end_repeat_nmi: movq $-1, %rsi call do_nmi - RESTORE_CR3 save_reg=%r14 + RESTORE_CR3 scratch_reg=%r15 save_reg=%r14 testl %ebx, %ebx /* swapgs needed? */ jnz nmi_restore From c5548af97ae98f9d9e6ae5a9a005e605bd3c06b5 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:08:01 +0100 Subject: [PATCH 0833/1013] x86/mm: Use INVPCID for __native_flush_tlb_single() commit 6cff64b86aaaa07f89f50498055a20e45754b0c1 upstream. This uses INVPCID to shoot down individual lines of the user mapping instead of marking the entire user map as invalid. This could/might/possibly be faster. This for sure needs tlb_single_page_flush_ceiling to be redetermined; esp. since INVPCID is _slow_. A detailed performance analysis is available here: https://lkml.kernel.org/r/3062e486-3539-8a1f-5724-16199420be71@intel.com [ Peterz: Split out from big combo patch ] Signed-off-by: Dave Hansen Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/tlbflush.h | 23 ++++++++++- arch/x86/mm/init.c | 64 +++++++++++++++++------------- 3 files changed, 60 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d8ec834ea884..07cdd1715705 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -197,6 +197,7 @@ #define X86_FEATURE_CAT_L3 ( 7*32+ 4) /* Cache Allocation Technology L3 */ #define X86_FEATURE_CAT_L2 ( 7*32+ 5) /* Cache Allocation Technology L2 */ #define X86_FEATURE_CDP_L3 ( 7*32+ 6) /* Code and Data Prioritization L3 */ +#define X86_FEATURE_INVPCID_SINGLE ( 7*32+ 7) /* Effectively INVPCID && CR4.PCIDE=1 */ #define X86_FEATURE_HW_PSTATE ( 7*32+ 8) /* AMD HW-PState */ #define X86_FEATURE_PROC_FEEDBACK ( 7*32+ 9) /* AMD ProcFeedbackInterface */ diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 5dcc38b16604..57072a1052fe 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -85,6 +85,18 @@ static inline u16 kern_pcid(u16 asid) return asid + 1; } +/* + * The user PCID is just the kernel one, plus the "switch bit". + */ +static inline u16 user_pcid(u16 asid) +{ + u16 ret = kern_pcid(asid); +#ifdef CONFIG_PAGE_TABLE_ISOLATION + ret |= 1 << X86_CR3_PTI_SWITCH_BIT; +#endif + return ret; +} + struct pgd_t; static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) { @@ -335,6 +347,8 @@ static inline void __native_flush_tlb_global(void) /* * Using INVPCID is considerably faster than a pair of writes * to CR4 sandwiched inside an IRQ flag save/restore. + * + * Note, this works with CR4.PCIDE=0 or 1. */ invpcid_flush_all(); return; @@ -368,7 +382,14 @@ static inline void __native_flush_tlb_single(unsigned long addr) if (!static_cpu_has(X86_FEATURE_PTI)) return; - invalidate_user_asid(loaded_mm_asid); + /* + * Some platforms #GP if we call invpcid(type=1/2) before CR4.PCIDE=1. + * Just use invalidate_user_asid() in case we are called early. + */ + if (!this_cpu_has(X86_FEATURE_INVPCID_SINGLE)) + invalidate_user_asid(loaded_mm_asid); + else + invpcid_flush_one(user_pcid(loaded_mm_asid), addr); } /* diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index caeb8a7bf0a4..80259ad8c386 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -203,34 +203,44 @@ static void __init probe_page_size_mask(void) static void setup_pcid(void) { -#ifdef CONFIG_X86_64 - if (boot_cpu_has(X86_FEATURE_PCID)) { - if (boot_cpu_has(X86_FEATURE_PGE)) { - /* - * This can't be cr4_set_bits_and_update_boot() -- - * the trampoline code can't handle CR4.PCIDE and - * it wouldn't do any good anyway. Despite the name, - * cr4_set_bits_and_update_boot() doesn't actually - * cause the bits in question to remain set all the - * way through the secondary boot asm. - * - * Instead, we brute-force it and set CR4.PCIDE - * manually in start_secondary(). - */ - cr4_set_bits(X86_CR4_PCIDE); - } else { - /* - * flush_tlb_all(), as currently implemented, won't - * work if PCID is on but PGE is not. Since that - * combination doesn't exist on real hardware, there's - * no reason to try to fully support it, but it's - * polite to avoid corrupting data if we're on - * an improperly configured VM. - */ - setup_clear_cpu_cap(X86_FEATURE_PCID); - } + if (!IS_ENABLED(CONFIG_X86_64)) + return; + + if (!boot_cpu_has(X86_FEATURE_PCID)) + return; + + if (boot_cpu_has(X86_FEATURE_PGE)) { + /* + * This can't be cr4_set_bits_and_update_boot() -- the + * trampoline code can't handle CR4.PCIDE and it wouldn't + * do any good anyway. Despite the name, + * cr4_set_bits_and_update_boot() doesn't actually cause + * the bits in question to remain set all the way through + * the secondary boot asm. + * + * Instead, we brute-force it and set CR4.PCIDE manually in + * start_secondary(). + */ + cr4_set_bits(X86_CR4_PCIDE); + + /* + * INVPCID's single-context modes (2/3) only work if we set + * X86_CR4_PCIDE, *and* we INVPCID support. It's unusable + * on systems that have X86_CR4_PCIDE clear, or that have + * no INVPCID support at all. + */ + if (boot_cpu_has(X86_FEATURE_INVPCID)) + setup_force_cpu_cap(X86_FEATURE_INVPCID_SINGLE); + } else { + /* + * flush_tlb_all(), as currently implemented, won't work if + * PCID is on but PGE is not. Since that combination + * doesn't exist on real hardware, there's no reason to try + * to fully support it, but it's polite to avoid corrupting + * data if we're on an improperly configured VM. + */ + setup_clear_cpu_cap(X86_FEATURE_PCID); } -#endif } #ifdef CONFIG_X86_32 From ef4b38472d6b1bf587554dfc7d5ab7abc835c1a5 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 5 Dec 2017 13:34:53 +0100 Subject: [PATCH 0834/1013] x86/mm: Clarify the whole ASID/kernel PCID/user PCID naming commit 0a126abd576ebc6403f063dbe20cf7416c9d9393 upstream. Ideally we'd also use sparse to enforce this separation so it becomes much more difficult to mess up. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/tlbflush.h | 55 ++++++++++++++++++++++++++------- 1 file changed, 43 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 57072a1052fe..b519da4fc03c 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -13,16 +13,33 @@ #include #include -static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) -{ - /* - * Bump the generation count. This also serves as a full barrier - * that synchronizes with switch_mm(): callers are required to order - * their read of mm_cpumask after their writes to the paging - * structures. - */ - return atomic64_inc_return(&mm->context.tlb_gen); -} +/* + * The x86 feature is called PCID (Process Context IDentifier). It is similar + * to what is traditionally called ASID on the RISC processors. + * + * We don't use the traditional ASID implementation, where each process/mm gets + * its own ASID and flush/restart when we run out of ASID space. + * + * Instead we have a small per-cpu array of ASIDs and cache the last few mm's + * that came by on this CPU, allowing cheaper switch_mm between processes on + * this CPU. + * + * We end up with different spaces for different things. To avoid confusion we + * use different names for each of them: + * + * ASID - [0, TLB_NR_DYN_ASIDS-1] + * the canonical identifier for an mm + * + * kPCID - [1, TLB_NR_DYN_ASIDS] + * the value we write into the PCID part of CR3; corresponds to the + * ASID+1, because PCID 0 is special. + * + * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS] + * for KPTI each mm has two address spaces and thus needs two + * PCID values, but we can still do with a single ASID denomination + * for each mm. Corresponds to kPCID + 2048. + * + */ /* There are 12 bits of space for ASIDS in CR3 */ #define CR3_HW_ASID_BITS 12 @@ -41,7 +58,7 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) /* * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account - * for them being zero-based. Another -1 is because ASID 0 is reserved for + * for them being zero-based. Another -1 is because PCID 0 is reserved for * use by non-PCID-aware users. */ #define MAX_ASID_AVAILABLE ((1 << CR3_AVAIL_PCID_BITS) - 2) @@ -52,6 +69,9 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) */ #define TLB_NR_DYN_ASIDS 6 +/* + * Given @asid, compute kPCID + */ static inline u16 kern_pcid(u16 asid) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); @@ -86,7 +106,7 @@ static inline u16 kern_pcid(u16 asid) } /* - * The user PCID is just the kernel one, plus the "switch bit". + * Given @asid, compute uPCID */ static inline u16 user_pcid(u16 asid) { @@ -484,6 +504,17 @@ static inline void flush_tlb_page(struct vm_area_struct *vma, unsigned long a) void native_flush_tlb_others(const struct cpumask *cpumask, const struct flush_tlb_info *info); +static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) +{ + /* + * Bump the generation count. This also serves as a full barrier + * that synchronizes with switch_mm(): callers are required to order + * their read of mm_cpumask after their writes to the paging + * structures. + */ + return atomic64_inc_return(&mm->context.tlb_gen); +} + static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, struct mm_struct *mm) { From 33d9d7836f0fa02777667d72bc815c12fbe61cac Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Tue, 19 Dec 2017 22:33:46 +0100 Subject: [PATCH 0835/1013] x86/dumpstack: Indicate in Oops whether PTI is configured and enabled commit 5f26d76c3fd67c48806415ef8b1116c97beff8ba upstream. CONFIG_PAGE_TABLE_ISOLATION is relatively new and intrusive feature that may still have some corner cases which could take some time to manifest and be fixed. It would be useful to have Oops messages indicate whether it was enabled for building the kernel, and whether it was disabled during boot. Example of fully enabled: Oops: 0001 [#1] SMP PTI Example of enabled during build, but disabled during boot: Oops: 0001 [#1] SMP NOPTI We can decide to remove this after the feature has been tested in the field long enough. [ tglx: Made it use boot_cpu_has() as requested by Borislav ] Signed-off-by: Vlastimil Babka Signed-off-by: Thomas Gleixner Reviewed-by: Eduardo Valentin Acked-by: Dave Hansen Cc: Andy Lutomirski Cc: Andy Lutomirsky Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: bpetkov@suse.de Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: jkosina@suse.cz Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/dumpstack.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 36b17e0febe8..5fa110699ed2 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -297,11 +297,13 @@ int __die(const char *str, struct pt_regs *regs, long err) unsigned long sp; #endif printk(KERN_DEFAULT - "%s: %04lx [#%d]%s%s%s%s\n", str, err & 0xffff, ++die_counter, + "%s: %04lx [#%d]%s%s%s%s%s\n", str, err & 0xffff, ++die_counter, IS_ENABLED(CONFIG_PREEMPT) ? " PREEMPT" : "", IS_ENABLED(CONFIG_SMP) ? " SMP" : "", debug_pagealloc_enabled() ? " DEBUG_PAGEALLOC" : "", - IS_ENABLED(CONFIG_KASAN) ? " KASAN" : ""); + IS_ENABLED(CONFIG_KASAN) ? " KASAN" : "", + IS_ENABLED(CONFIG_PAGE_TABLE_ISOLATION) ? + (boot_cpu_has(X86_FEATURE_PTI) ? " PTI" : " NOPTI") : ""); if (notify_die(DIE_OOPS, str, regs, err, current->thread.trap_nr, SIGSEGV) == NOTIFY_STOP) From 3dfd9fd8d897214b1a880c7fd8ed36b88faa1c02 Mon Sep 17 00:00:00 2001 From: Dave Hansen Date: Mon, 4 Dec 2017 15:08:03 +0100 Subject: [PATCH 0836/1013] x86/mm/pti: Add Kconfig commit 385ce0ea4c078517fa51c261882c4e72fba53005 upstream. Finally allow CONFIG_PAGE_TABLE_ISOLATION to be enabled. PARAVIRT generally requires that the kernel not manage its own page tables. It also means that the hypervisor and kernel must agree wholeheartedly about what format the page tables are in and what they contain. PAGE_TABLE_ISOLATION, unfortunately, changes the rules and they can not be used together. I've seen conflicting feedback from maintainers lately about whether they want the Kconfig magic to go first or last in a patch series. It's going last here because the partially-applied series leads to kernels that can not boot in a bunch of cases. I did a run through the entire series with CONFIG_PAGE_TABLE_ISOLATION=y to look for build errors, though. [ tglx: Removed SMP and !PARAVIRT dependencies as they not longer exist ] Signed-off-by: Dave Hansen Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- security/Kconfig | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/security/Kconfig b/security/Kconfig index e8e449444e65..6614b9312b45 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -54,6 +54,17 @@ config SECURITY_NETWORK implement socket and networking access controls. If you are unsure how to answer this question, answer N. +config PAGE_TABLE_ISOLATION + bool "Remove the kernel mapping in user mode" + depends on X86_64 && !UML + default y + help + This feature reduces the number of hardware side channels by + ensuring that the majority of kernel addresses are not mapped + into userspace. + + See Documentation/x86/pagetable-isolation.txt for more details. + config SECURITY_INFINIBAND bool "Infiniband Security Hooks" depends on SECURITY && INFINIBAND From dfa58126d7630c0db0b64fa7bd15741293b517b5 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Mon, 4 Dec 2017 15:08:04 +0100 Subject: [PATCH 0837/1013] x86/mm/dump_pagetables: Add page table directory to the debugfs VFS hierarchy commit 75298aa179d56cd64f54e58a19fffc8ab922b4c0 upstream. The upcoming support for dumping the kernel and the user space page tables of the current process would create more random files in the top level debugfs directory. Add a page table directory and move the existing file to it. Signed-off-by: Borislav Petkov Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/debug_pagetables.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c index bfcffdf6c577..d1449fb6dc7a 100644 --- a/arch/x86/mm/debug_pagetables.c +++ b/arch/x86/mm/debug_pagetables.c @@ -22,21 +22,26 @@ static const struct file_operations ptdump_fops = { .release = single_release, }; -static struct dentry *pe; +static struct dentry *dir, *pe; static int __init pt_dump_debug_init(void) { - pe = debugfs_create_file("kernel_page_tables", S_IRUSR, NULL, NULL, - &ptdump_fops); - if (!pe) + dir = debugfs_create_dir("page_tables", NULL); + if (!dir) return -ENOMEM; + pe = debugfs_create_file("kernel", 0400, dir, NULL, &ptdump_fops); + if (!pe) + goto err; return 0; +err: + debugfs_remove_recursive(dir); + return -ENOMEM; } static void __exit pt_dump_debug_exit(void) { - debugfs_remove_recursive(pe); + debugfs_remove_recursive(dir); } module_init(pt_dump_debug_init); From 27e16c33bb796cceb92e339998d30145076b43cd Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:08:05 +0100 Subject: [PATCH 0838/1013] x86/mm/dump_pagetables: Check user space page table for WX pages commit b4bf4f924b1d7bade38fd51b2e401d20d0956e4d upstream. ptdump_walk_pgd_level_checkwx() checks the kernel page table for WX pages, but does not check the PAGE_TABLE_ISOLATION user space page table. Restructure the code so that dmesg output is selected by an explicit argument and not implicit via checking the pgd argument for !NULL. Add the check for the user space page table. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 1 + arch/x86/mm/debug_pagetables.c | 2 +- arch/x86/mm/dump_pagetables.c | 30 +++++++++++++++++++++++++----- 3 files changed, 27 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 9aaeea687834..6b8bd9016eeb 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -28,6 +28,7 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD]; int __init __early_make_pgtable(unsigned long address, pmdval_t pmd); void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd); +void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd); void ptdump_walk_pgd_level_checkwx(void); #ifdef CONFIG_DEBUG_WX diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c index d1449fb6dc7a..8e70c1599e51 100644 --- a/arch/x86/mm/debug_pagetables.c +++ b/arch/x86/mm/debug_pagetables.c @@ -5,7 +5,7 @@ static int ptdump_show(struct seq_file *m, void *v) { - ptdump_walk_pgd_level(m, NULL); + ptdump_walk_pgd_level_debugfs(m, NULL); return 0; } diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 690eaf31ca34..17f5b417f95e 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -476,7 +476,7 @@ static inline bool is_hypervisor_range(int idx) } static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, - bool checkwx) + bool checkwx, bool dmesg) { #ifdef CONFIG_X86_64 pgd_t *start = (pgd_t *) &init_top_pgt; @@ -489,7 +489,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, if (pgd) { start = pgd; - st.to_dmesg = true; + st.to_dmesg = dmesg; } st.check_wx = checkwx; @@ -527,13 +527,33 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd, void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) { - ptdump_walk_pgd_level_core(m, pgd, false); + ptdump_walk_pgd_level_core(m, pgd, false, true); +} + +void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd) +{ + ptdump_walk_pgd_level_core(m, pgd, false, false); +} +EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs); + +static void ptdump_walk_user_pgd_level_checkwx(void) +{ +#ifdef CONFIG_PAGE_TABLE_ISOLATION + pgd_t *pgd = (pgd_t *) &init_top_pgt; + + if (!static_cpu_has(X86_FEATURE_PTI)) + return; + + pr_info("x86/mm: Checking user space page tables\n"); + pgd = kernel_to_user_pgdp(pgd); + ptdump_walk_pgd_level_core(NULL, pgd, true, false); +#endif } -EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level); void ptdump_walk_pgd_level_checkwx(void) { - ptdump_walk_pgd_level_core(NULL, NULL, true); + ptdump_walk_pgd_level_core(NULL, NULL, true, false); + ptdump_walk_user_pgd_level_checkwx(); } static int __init pt_dump_init(void) From 704cfa04dde3a35cde59a67cafbcaeacbb3a76c8 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 4 Dec 2017 15:08:06 +0100 Subject: [PATCH 0839/1013] x86/mm/dump_pagetables: Allow dumping current pagetables commit a4b51ef6552c704764684cef7e753162dc87c5fa upstream. Add two debugfs files which allow to dump the pagetable of the current task. current_kernel dumps the regular page table. This is the page table which is normally shared between kernel and user space. If kernel page table isolation is enabled this is the kernel space mapping. If kernel page table isolation is enabled the second file, current_user, dumps the user space page table. These files allow to verify the resulting page tables for page table isolation, but even in the normal case its useful to be able to inspect user space page tables of current for debugging purposes. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Boris Ostrovsky Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Denys Vlasenko Cc: Eduardo Valentin Cc: Greg KH Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Will Deacon Cc: aliguori@amazon.com Cc: daniel.gruss@iaik.tugraz.at Cc: hughd@google.com Cc: keescook@google.com Cc: linux-mm@kvack.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/pgtable.h | 2 +- arch/x86/mm/debug_pagetables.c | 71 ++++++++++++++++++++++++++++++++-- arch/x86/mm/dump_pagetables.c | 6 ++- 3 files changed, 73 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 6b8bd9016eeb..211368922cad 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -28,7 +28,7 @@ extern pgd_t early_top_pgt[PTRS_PER_PGD]; int __init __early_make_pgtable(unsigned long address, pmdval_t pmd); void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd); -void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd); +void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user); void ptdump_walk_pgd_level_checkwx(void); #ifdef CONFIG_DEBUG_WX diff --git a/arch/x86/mm/debug_pagetables.c b/arch/x86/mm/debug_pagetables.c index 8e70c1599e51..421f2664ffa0 100644 --- a/arch/x86/mm/debug_pagetables.c +++ b/arch/x86/mm/debug_pagetables.c @@ -5,7 +5,7 @@ static int ptdump_show(struct seq_file *m, void *v) { - ptdump_walk_pgd_level_debugfs(m, NULL); + ptdump_walk_pgd_level_debugfs(m, NULL, false); return 0; } @@ -22,7 +22,57 @@ static const struct file_operations ptdump_fops = { .release = single_release, }; -static struct dentry *dir, *pe; +static int ptdump_show_curknl(struct seq_file *m, void *v) +{ + if (current->mm->pgd) { + down_read(¤t->mm->mmap_sem); + ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, false); + up_read(¤t->mm->mmap_sem); + } + return 0; +} + +static int ptdump_open_curknl(struct inode *inode, struct file *filp) +{ + return single_open(filp, ptdump_show_curknl, NULL); +} + +static const struct file_operations ptdump_curknl_fops = { + .owner = THIS_MODULE, + .open = ptdump_open_curknl, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; + +#ifdef CONFIG_PAGE_TABLE_ISOLATION +static struct dentry *pe_curusr; + +static int ptdump_show_curusr(struct seq_file *m, void *v) +{ + if (current->mm->pgd) { + down_read(¤t->mm->mmap_sem); + ptdump_walk_pgd_level_debugfs(m, current->mm->pgd, true); + up_read(¤t->mm->mmap_sem); + } + return 0; +} + +static int ptdump_open_curusr(struct inode *inode, struct file *filp) +{ + return single_open(filp, ptdump_show_curusr, NULL); +} + +static const struct file_operations ptdump_curusr_fops = { + .owner = THIS_MODULE, + .open = ptdump_open_curusr, + .read = seq_read, + .llseek = seq_lseek, + .release = single_release, +}; +#endif + +static struct dentry *dir, *pe_knl, *pe_curknl; static int __init pt_dump_debug_init(void) { @@ -30,9 +80,22 @@ static int __init pt_dump_debug_init(void) if (!dir) return -ENOMEM; - pe = debugfs_create_file("kernel", 0400, dir, NULL, &ptdump_fops); - if (!pe) + pe_knl = debugfs_create_file("kernel", 0400, dir, NULL, + &ptdump_fops); + if (!pe_knl) + goto err; + + pe_curknl = debugfs_create_file("current_kernel", 0400, + dir, NULL, &ptdump_curknl_fops); + if (!pe_curknl) + goto err; + +#ifdef CONFIG_PAGE_TABLE_ISOLATION + pe_curusr = debugfs_create_file("current_user", 0400, + dir, NULL, &ptdump_curusr_fops); + if (!pe_curusr) goto err; +#endif return 0; err: debugfs_remove_recursive(dir); diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 17f5b417f95e..f56902c1f04b 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -530,8 +530,12 @@ void ptdump_walk_pgd_level(struct seq_file *m, pgd_t *pgd) ptdump_walk_pgd_level_core(m, pgd, false, true); } -void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd) +void ptdump_walk_pgd_level_debugfs(struct seq_file *m, pgd_t *pgd, bool user) { +#ifdef CONFIG_PAGE_TABLE_ISOLATION + if (user && static_cpu_has(X86_FEATURE_PTI)) + pgd = kernel_to_user_pgdp(pgd); +#endif ptdump_walk_pgd_level_core(m, pgd, false, false); } EXPORT_SYMBOL_GPL(ptdump_walk_pgd_level_debugfs); From e08acdb9620bcd4c55128cad07e625cd1825e533 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 15 Dec 2017 20:35:11 +0100 Subject: [PATCH 0840/1013] x86/ldt: Make the LDT mapping RO commit 9f5cb6b32d9e0a3a7453222baaf15664d92adbf2 upstream. Now that the LDT mapping is in a known area when PAGE_TABLE_ISOLATION is enabled its a primary target for attacks, if a user space interface fails to validate a write address correctly. That can never happen, right? The SDM states: If the segment descriptors in the GDT or an LDT are placed in ROM, the processor can enter an indefinite loop if software or the processor attempts to update (write to) the ROM-based segment descriptors. To prevent this problem, set the accessed bits for all segment descriptors placed in a ROM. Also, remove operating-system or executive code that attempts to modify segment descriptors located in ROM. So its a valid approach to set the ACCESS bit when setting up the LDT entry and to map the table RO. Fixup the selftest so it can handle that new mode. Remove the manual ACCESS bit setter in set_tls_desc() as this is now pointless. Folded the patch from Peter Ziljstra. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Josh Poimboeuf Cc: Juergen Gross Cc: Linus Torvalds Cc: Peter Zijlstra Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/desc.h | 2 ++ arch/x86/kernel/ldt.c | 7 ++++++- arch/x86/kernel/tls.c | 11 ++--------- tools/testing/selftests/x86/ldt_gdt.c | 3 +-- 4 files changed, 11 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index bc359dd2f7f6..85e23bb7b34e 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -21,6 +21,8 @@ static inline void fill_ldt(struct desc_struct *desc, const struct user_desc *in desc->type = (info->read_exec_only ^ 1) << 1; desc->type |= info->contents << 2; + /* Set the ACCESS bit so it can be mapped RO */ + desc->type |= 1; desc->s = 1; desc->dpl = 0x3; diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 9629c5d8267a..579cc4a66fdf 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -158,7 +158,12 @@ map_ldt_struct(struct mm_struct *mm, struct ldt_struct *ldt, int slot) ptep = get_locked_pte(mm, va, &ptl); if (!ptep) return -ENOMEM; - pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL & ~_PAGE_GLOBAL)); + /* + * Map it RO so the easy to find address is not a primary + * target via some kernel interface which misses a + * permission check. + */ + pte = pfn_pte(pfn, __pgprot(__PAGE_KERNEL_RO & ~_PAGE_GLOBAL)); set_pte_at(mm, va, ptep, pte); pte_unmap_unlock(ptep, ptl); } diff --git a/arch/x86/kernel/tls.c b/arch/x86/kernel/tls.c index 9a9c9b076955..a5b802a12212 100644 --- a/arch/x86/kernel/tls.c +++ b/arch/x86/kernel/tls.c @@ -93,17 +93,10 @@ static void set_tls_desc(struct task_struct *p, int idx, cpu = get_cpu(); while (n-- > 0) { - if (LDT_empty(info) || LDT_zero(info)) { + if (LDT_empty(info) || LDT_zero(info)) memset(desc, 0, sizeof(*desc)); - } else { + else fill_ldt(desc, info); - - /* - * Always set the accessed bit so that the CPU - * doesn't try to write to the (read-only) GDT. - */ - desc->type |= 1; - } ++info; ++desc; } diff --git a/tools/testing/selftests/x86/ldt_gdt.c b/tools/testing/selftests/x86/ldt_gdt.c index 0304ffb714f2..1aef72df20a1 100644 --- a/tools/testing/selftests/x86/ldt_gdt.c +++ b/tools/testing/selftests/x86/ldt_gdt.c @@ -122,8 +122,7 @@ static void check_valid_segment(uint16_t index, int ldt, * NB: Different Linux versions do different things with the * accessed bit in set_thread_area(). */ - if (ar != expected_ar && - (ldt || ar != (expected_ar | AR_ACCESSED))) { + if (ar != expected_ar && ar != (expected_ar | AR_ACCESSED)) { printf("[FAIL]\t%s entry %hu has AR 0x%08X but expected 0x%08X\n", (ldt ? "LDT" : "GDT"), index, ar, expected_ar); nerrs++; From 66f833dbed02d39c44440b6b35ac088655c32edb Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 22 Dec 2017 20:32:35 -0500 Subject: [PATCH 0841/1013] ring-buffer: Mask out the info bits when returning buffer page length commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream. Two info bits were added to the "commit" part of the ring buffer data page when returned to be consumed. This was to inform the user space readers that events have been missed, and that the count may be stored at the end of the page. What wasn't handled, was the splice code that actually called a function to return the length of the data in order to zero out the rest of the page before sending it up to user space. These data bits were returned with the length making the value negative, and that negative value was not checked. It was compared to PAGE_SIZE, and only used if the size was less than PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an unsigned compare, meaning the negative size value did not end up causing a large portion of memory to be randomly zeroed out. Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/ring_buffer.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 81279c6602ff..b37718c2d718 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -281,6 +281,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_event_data); /* Missed count stored at end */ #define RB_MISSED_STORED (1 << 30) +#define RB_MISSED_FLAGS (RB_MISSED_EVENTS|RB_MISSED_STORED) + struct buffer_data_page { u64 time_stamp; /* page time stamp */ local_t commit; /* write committed index */ @@ -332,7 +334,9 @@ static void rb_init_page(struct buffer_data_page *bpage) */ size_t ring_buffer_page_len(void *page) { - return local_read(&((struct buffer_data_page *)page)->commit) + struct buffer_data_page *bpage = page; + + return (local_read(&bpage->commit) & ~RB_MISSED_FLAGS) + BUF_PAGE_HDR_SIZE; } From 0aea6fb0e777bb89437c8883c289e05e1f153ec8 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (VMware)" Date: Fri, 22 Dec 2017 21:19:29 -0500 Subject: [PATCH 0842/1013] ring-buffer: Do no reuse reader page if still in use commit ae415fa4c5248a8cf4faabd5a3c20576cb1ad607 upstream. To free the reader page that is allocated with ring_buffer_alloc_read_page(), ring_buffer_free_read_page() must be called. For faster performance, this page can be reused by the ring buffer to avoid having to free and allocate new pages. The issue arises when the page is used with a splice pipe into the networking code. The networking code may up the page counter for the page, and keep it active while sending it is queued to go to the network. The incrementing of the page ref does not prevent it from being reused in the ring buffer, and this can cause the page that is being sent out to the network to be modified before it is sent by reading new data. Add a check to the page ref counter, and only reuse the page if it is not being used anywhere else. Fixes: 73a757e63114d ("ring-buffer: Return reader page back into existing ring buffer") Signed-off-by: Steven Rostedt (VMware) Signed-off-by: Greg Kroah-Hartman --- kernel/trace/ring_buffer.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index b37718c2d718..0476a9372014 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -4443,8 +4443,13 @@ void ring_buffer_free_read_page(struct ring_buffer *buffer, int cpu, void *data) { struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu]; struct buffer_data_page *bpage = data; + struct page *page = virt_to_page(bpage); unsigned long flags; + /* If the page is still in use someplace else, we can't reuse it */ + if (page_ref_count(page) > 1) + goto out; + local_irq_save(flags); arch_spin_lock(&cpu_buffer->lock); @@ -4456,6 +4461,7 @@ void ring_buffer_free_read_page(struct ring_buffer *buffer, int cpu, void *data) arch_spin_unlock(&cpu_buffer->lock); local_irq_restore(flags); + out: free_page((unsigned long)bpage); } EXPORT_SYMBOL_GPL(ring_buffer_free_read_page); From 2aec84963e5edf3c463c37da0a94fb4c12c2d2c2 Mon Sep 17 00:00:00 2001 From: Steve Wise Date: Mon, 18 Dec 2017 13:10:00 -0800 Subject: [PATCH 0843/1013] iw_cxgb4: Only validate the MSN for successful completions commit f55688c45442bc863f40ad678c638785b26cdce6 upstream. If the RECV CQE is in error, ignore the MSN check. This was causing recvs that were flushed into the sw cq to be completed with the wrong status (BAD_MSN instead of FLUSHED). Signed-off-by: Steve Wise Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/cxgb4/cq.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/infiniband/hw/cxgb4/cq.c b/drivers/infiniband/hw/cxgb4/cq.c index eae8ea81c6e2..514c1000ded1 100644 --- a/drivers/infiniband/hw/cxgb4/cq.c +++ b/drivers/infiniband/hw/cxgb4/cq.c @@ -586,10 +586,10 @@ static int poll_cq(struct t4_wq *wq, struct t4_cq *cq, struct t4_cqe *cqe, ret = -EAGAIN; goto skip_cqe; } - if (unlikely((CQE_WRID_MSN(hw_cqe) != (wq->rq.msn)))) { + if (unlikely(!CQE_STATUS(hw_cqe) && + CQE_WRID_MSN(hw_cqe) != wq->rq.msn)) { t4_set_wq_in_error(wq); - hw_cqe->header |= htonl(CQE_STATUS_V(T4_ERR_MSN)); - goto proc_cqe; + hw_cqe->header |= cpu_to_be32(CQE_STATUS_V(T4_ERR_MSN)); } goto proc_cqe; } From 23ef17a49f1edfa2c4b437c03b1c0d4da1d116d2 Mon Sep 17 00:00:00 2001 From: Srinivas Kandagatla Date: Thu, 30 Nov 2017 10:15:02 +0000 Subject: [PATCH 0844/1013] ASoC: codecs: msm8916-wcd: Fix supported formats commit 51f493ae71adc2c49a317a13c38e54e1cdf46005 upstream. This codec is configurable for only 16 bit and 32 bit samples, so reflect this in the supported formats also remove 24bit sample from supported list. Signed-off-by: Srinivas Kandagatla Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/msm8916-wcd-analog.c | 2 +- sound/soc/codecs/msm8916-wcd-digital.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/sound/soc/codecs/msm8916-wcd-analog.c b/sound/soc/codecs/msm8916-wcd-analog.c index 18933bf6473f..8c7063e1aa46 100644 --- a/sound/soc/codecs/msm8916-wcd-analog.c +++ b/sound/soc/codecs/msm8916-wcd-analog.c @@ -267,7 +267,7 @@ #define MSM8916_WCD_ANALOG_RATES (SNDRV_PCM_RATE_8000 | SNDRV_PCM_RATE_16000 |\ SNDRV_PCM_RATE_32000 | SNDRV_PCM_RATE_48000) #define MSM8916_WCD_ANALOG_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\ - SNDRV_PCM_FMTBIT_S24_LE) + SNDRV_PCM_FMTBIT_S32_LE) static int btn_mask = SND_JACK_BTN_0 | SND_JACK_BTN_1 | SND_JACK_BTN_2 | SND_JACK_BTN_3 | SND_JACK_BTN_4; diff --git a/sound/soc/codecs/msm8916-wcd-digital.c b/sound/soc/codecs/msm8916-wcd-digital.c index 66df8f810f0d..694db27b11fa 100644 --- a/sound/soc/codecs/msm8916-wcd-digital.c +++ b/sound/soc/codecs/msm8916-wcd-digital.c @@ -194,7 +194,7 @@ SNDRV_PCM_RATE_32000 | \ SNDRV_PCM_RATE_48000) #define MSM8916_WCD_DIGITAL_FORMATS (SNDRV_PCM_FMTBIT_S16_LE |\ - SNDRV_PCM_FMTBIT_S24_LE) + SNDRV_PCM_FMTBIT_S32_LE) struct msm8916_wcd_digital_priv { struct clk *ahbclk, *mclk; @@ -645,7 +645,7 @@ static int msm8916_wcd_digital_hw_params(struct snd_pcm_substream *substream, RX_I2S_CTL_RX_I2S_MODE_MASK, RX_I2S_CTL_RX_I2S_MODE_16); break; - case SNDRV_PCM_FORMAT_S24_LE: + case SNDRV_PCM_FORMAT_S32_LE: snd_soc_update_bits(dai->codec, LPASS_CDC_CLK_TX_I2S_CTL, TX_I2S_CTL_TX_I2S_MODE_MASK, TX_I2S_CTL_TX_I2S_MODE_32); From 308ddf2afe83994719006200204dc819fcbd9e7a Mon Sep 17 00:00:00 2001 From: Ben Hutchings Date: Fri, 8 Dec 2017 16:15:20 +0000 Subject: [PATCH 0845/1013] ASoC: wm_adsp: Fix validation of firmware and coeff lengths commit 50dd2ea8ef67a1617e0c0658bcbec4b9fb03b936 upstream. The checks for whether another region/block header could be present are subtracting the size from the current offset. Obviously we should instead subtract the offset from the size. The checks for whether the region/block data fit in the file are adding the data size to the current offset and header size, without checking for integer overflow. Rearrange these so that overflow is impossible. Signed-off-by: Ben Hutchings Acked-by: Charles Keepax Tested-by: Charles Keepax Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/wm_adsp.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/sound/soc/codecs/wm_adsp.c b/sound/soc/codecs/wm_adsp.c index 65c059b5ffd7..66e32f5d2917 100644 --- a/sound/soc/codecs/wm_adsp.c +++ b/sound/soc/codecs/wm_adsp.c @@ -1733,7 +1733,7 @@ static int wm_adsp_load(struct wm_adsp *dsp) le64_to_cpu(footer->timestamp)); while (pos < firmware->size && - pos - firmware->size > sizeof(*region)) { + sizeof(*region) < firmware->size - pos) { region = (void *)&(firmware->data[pos]); region_name = "Unknown"; reg = 0; @@ -1782,8 +1782,8 @@ static int wm_adsp_load(struct wm_adsp *dsp) regions, le32_to_cpu(region->len), offset, region_name); - if ((pos + le32_to_cpu(region->len) + sizeof(*region)) > - firmware->size) { + if (le32_to_cpu(region->len) > + firmware->size - pos - sizeof(*region)) { adsp_err(dsp, "%s.%d: %s region len %d bytes exceeds file length %zu\n", file, regions, region_name, @@ -2253,7 +2253,7 @@ static int wm_adsp_load_coeff(struct wm_adsp *dsp) blocks = 0; while (pos < firmware->size && - pos - firmware->size > sizeof(*blk)) { + sizeof(*blk) < firmware->size - pos) { blk = (void *)(&firmware->data[pos]); type = le16_to_cpu(blk->type); @@ -2327,8 +2327,8 @@ static int wm_adsp_load_coeff(struct wm_adsp *dsp) } if (reg) { - if ((pos + le32_to_cpu(blk->len) + sizeof(*blk)) > - firmware->size) { + if (le32_to_cpu(blk->len) > + firmware->size - pos - sizeof(*blk)) { adsp_err(dsp, "%s.%d: %s region len %d bytes exceeds file length %zu\n", file, blocks, region_name, From c7d231ca5e0bf38f6920e92c97feadede0978d1a Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Nov 2017 12:12:55 +0100 Subject: [PATCH 0846/1013] ASoC: da7218: fix fix child-node lookup commit bc6476d6c1edcb9b97621b5131bd169aa81f27db upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent codec node was also prematurely freed. Fixes: 4d50934abd22 ("ASoC: da7218: Add da7218 codec driver") Signed-off-by: Johan Hovold Acked-by: Adam Thomson Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/da7218.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/codecs/da7218.c b/sound/soc/codecs/da7218.c index b2d42ec1dcd9..56564ce90cb6 100644 --- a/sound/soc/codecs/da7218.c +++ b/sound/soc/codecs/da7218.c @@ -2520,7 +2520,7 @@ static struct da7218_pdata *da7218_of_to_pdata(struct snd_soc_codec *codec) } if (da7218->dev_id == DA7218_DEV_ID) { - hpldet_np = of_find_node_by_name(np, "da7218_hpldet"); + hpldet_np = of_get_child_by_name(np, "da7218_hpldet"); if (!hpldet_np) return pdata; From fe9f7bd45c01b4bc328d2442136c5456608f1637 Mon Sep 17 00:00:00 2001 From: "Maciej S. Szmigiero" Date: Mon, 20 Nov 2017 23:14:55 +0100 Subject: [PATCH 0847/1013] ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure commit 695b78b548d8a26288f041e907ff17758df9e1d5 upstream. AC'97 ops (register read / write) need SSI regmap and clock, so they have to be set after them. We also need to set these ops back to NULL if we fail the probe. Signed-off-by: Maciej S. Szmigiero Acked-by: Nicolin Chen Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/fsl/fsl_ssi.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c index 64598d1183f8..3ffbb498cc70 100644 --- a/sound/soc/fsl/fsl_ssi.c +++ b/sound/soc/fsl/fsl_ssi.c @@ -1452,12 +1452,6 @@ static int fsl_ssi_probe(struct platform_device *pdev) sizeof(fsl_ssi_ac97_dai)); fsl_ac97_data = ssi_private; - - ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev); - if (ret) { - dev_err(&pdev->dev, "could not set AC'97 ops\n"); - return ret; - } } else { /* Initialize this copy of the CPU DAI driver structure */ memcpy(&ssi_private->cpu_dai_drv, &fsl_ssi_dai_template, @@ -1568,6 +1562,14 @@ static int fsl_ssi_probe(struct platform_device *pdev) return ret; } + if (fsl_ssi_is_ac97(ssi_private)) { + ret = snd_soc_set_ac97_ops_of_reset(&fsl_ssi_ac97_ops, pdev); + if (ret) { + dev_err(&pdev->dev, "could not set AC'97 ops\n"); + goto error_ac97_ops; + } + } + ret = devm_snd_soc_register_component(&pdev->dev, &fsl_ssi_component, &ssi_private->cpu_dai_drv, 1); if (ret) { @@ -1651,6 +1653,10 @@ static int fsl_ssi_probe(struct platform_device *pdev) fsl_ssi_debugfs_remove(&ssi_private->dbg_stats); error_asoc_register: + if (fsl_ssi_is_ac97(ssi_private)) + snd_soc_set_ac97_ops(NULL); + +error_ac97_ops: if (ssi_private->soc->imx) fsl_ssi_imx_clean(pdev, ssi_private); From 314d9cdf7e0fb83781e8fae836044856a449ffe0 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Nov 2017 12:12:56 +0100 Subject: [PATCH 0848/1013] ASoC: twl4030: fix child-node lookup commit 15f8c5f2415bfac73f33a14bcd83422bcbfb5298 upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. To make things worse, the parent codec node was also prematurely freed, while the child node was leaked. Fixes: 2d6d649a2e0f ("ASoC: twl4030: Support for DT booted kernel") Signed-off-by: Johan Hovold Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/twl4030.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/sound/soc/codecs/twl4030.c b/sound/soc/codecs/twl4030.c index c482b2e7a7d2..cfe72b9d4356 100644 --- a/sound/soc/codecs/twl4030.c +++ b/sound/soc/codecs/twl4030.c @@ -232,7 +232,7 @@ static struct twl4030_codec_data *twl4030_get_pdata(struct snd_soc_codec *codec) struct twl4030_codec_data *pdata = dev_get_platdata(codec->dev); struct device_node *twl4030_codec_node = NULL; - twl4030_codec_node = of_find_node_by_name(codec->dev->parent->of_node, + twl4030_codec_node = of_get_child_by_name(codec->dev->parent->of_node, "codec"); if (!pdata && twl4030_codec_node) { @@ -241,9 +241,11 @@ static struct twl4030_codec_data *twl4030_get_pdata(struct snd_soc_codec *codec) GFP_KERNEL); if (!pdata) { dev_err(codec->dev, "Can not allocate memory\n"); + of_node_put(twl4030_codec_node); return NULL; } twl4030_setup_pdata_of(pdata, twl4030_codec_node); + of_node_put(twl4030_codec_node); } return pdata; From 077cb91c9fc35d638e014374491ac6a9f41b19ba Mon Sep 17 00:00:00 2001 From: "Andrew F. Davis" Date: Wed, 29 Nov 2017 15:32:46 -0600 Subject: [PATCH 0849/1013] ASoC: tlv320aic31xx: Fix GPIO1 register definition commit 737e0b7b67bdfe24090fab2852044bb283282fc5 upstream. GPIO1 control register is number 51, fix this here. Fixes: bafcbfe429eb ("ASoC: tlv320aic31xx: Make the register values human readable") Signed-off-by: Andrew F. Davis Signed-off-by: Mark Brown Signed-off-by: Greg Kroah-Hartman --- sound/soc/codecs/tlv320aic31xx.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/soc/codecs/tlv320aic31xx.h b/sound/soc/codecs/tlv320aic31xx.h index 730fb2058869..1ff3edb7bbb6 100644 --- a/sound/soc/codecs/tlv320aic31xx.h +++ b/sound/soc/codecs/tlv320aic31xx.h @@ -116,7 +116,7 @@ struct aic31xx_pdata { /* INT2 interrupt control */ #define AIC31XX_INT2CTRL AIC31XX_REG(0, 49) /* GPIO1 control */ -#define AIC31XX_GPIO1 AIC31XX_REG(0, 50) +#define AIC31XX_GPIO1 AIC31XX_REG(0, 51) #define AIC31XX_DACPRB AIC31XX_REG(0, 60) /* ADC Instruction Set Register */ From 074e2892a4205b0c88742d9d000c0c19e0b98244 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 15 Dec 2017 15:02:33 +0100 Subject: [PATCH 0850/1013] gpio: fix "gpio-line-names" property retrieval commit 822703354774ec935169cbbc8d503236bcb54fda upstream. Following commit 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names() to use device property accessors"), "gpio-line-names" DT property is not retrieved anymore when chip->parent is not set by the driver. This is due to OF based property reads having been replaced by device based property reads. This patch fixes that by making use of fwnode_property_read_string_array() instead of device_property_read_string_array() and handing over either of_fwnode_handle(chip->of_node) or dev_fwnode(chip->parent) to that function. Fixes: 9427ecbed46cc ("gpio: Rework of_gpiochip_set_names() to use device property accessors") Signed-off-by: Christophe Leroy Acked-by: Mika Westerberg Signed-off-by: Linus Walleij Signed-off-by: Greg Kroah-Hartman --- drivers/gpio/gpiolib-acpi.c | 2 +- drivers/gpio/gpiolib-devprop.c | 17 +++++++---------- drivers/gpio/gpiolib-of.c | 3 ++- drivers/gpio/gpiolib.h | 3 ++- 4 files changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c index eb4528c87c0b..d6f3d9ee1350 100644 --- a/drivers/gpio/gpiolib-acpi.c +++ b/drivers/gpio/gpiolib-acpi.c @@ -1074,7 +1074,7 @@ void acpi_gpiochip_add(struct gpio_chip *chip) } if (!chip->names) - devprop_gpiochip_set_names(chip); + devprop_gpiochip_set_names(chip, dev_fwnode(chip->parent)); acpi_gpiochip_request_regions(acpi_gpio); acpi_gpiochip_scan_gpios(acpi_gpio); diff --git a/drivers/gpio/gpiolib-devprop.c b/drivers/gpio/gpiolib-devprop.c index 27f383bda7d9..f748aa3e77f7 100644 --- a/drivers/gpio/gpiolib-devprop.c +++ b/drivers/gpio/gpiolib-devprop.c @@ -19,30 +19,27 @@ /** * devprop_gpiochip_set_names - Set GPIO line names using device properties * @chip: GPIO chip whose lines should be named, if possible + * @fwnode: Property Node containing the gpio-line-names property * * Looks for device property "gpio-line-names" and if it exists assigns * GPIO line names for the chip. The memory allocated for the assigned * names belong to the underlying firmware node and should not be released * by the caller. */ -void devprop_gpiochip_set_names(struct gpio_chip *chip) +void devprop_gpiochip_set_names(struct gpio_chip *chip, + const struct fwnode_handle *fwnode) { struct gpio_device *gdev = chip->gpiodev; const char **names; int ret, i; - if (!chip->parent) { - dev_warn(&gdev->dev, "GPIO chip parent is NULL\n"); - return; - } - - ret = device_property_read_string_array(chip->parent, "gpio-line-names", + ret = fwnode_property_read_string_array(fwnode, "gpio-line-names", NULL, 0); if (ret < 0) return; if (ret != gdev->ngpio) { - dev_warn(chip->parent, + dev_warn(&gdev->dev, "names %d do not match number of GPIOs %d\n", ret, gdev->ngpio); return; @@ -52,10 +49,10 @@ void devprop_gpiochip_set_names(struct gpio_chip *chip) if (!names) return; - ret = device_property_read_string_array(chip->parent, "gpio-line-names", + ret = fwnode_property_read_string_array(fwnode, "gpio-line-names", names, gdev->ngpio); if (ret < 0) { - dev_warn(chip->parent, "failed to read GPIO line names\n"); + dev_warn(&gdev->dev, "failed to read GPIO line names\n"); kfree(names); return; } diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c index bfcd20699ec8..ba38f530e403 100644 --- a/drivers/gpio/gpiolib-of.c +++ b/drivers/gpio/gpiolib-of.c @@ -493,7 +493,8 @@ int of_gpiochip_add(struct gpio_chip *chip) /* If the chip defines names itself, these take precedence */ if (!chip->names) - devprop_gpiochip_set_names(chip); + devprop_gpiochip_set_names(chip, + of_fwnode_handle(chip->of_node)); of_node_get(chip->of_node); diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h index d003ccb12781..3d4d0634c9dd 100644 --- a/drivers/gpio/gpiolib.h +++ b/drivers/gpio/gpiolib.h @@ -224,7 +224,8 @@ static inline int gpio_chip_hwgpio(const struct gpio_desc *desc) return desc - &desc->gdev->descs[0]; } -void devprop_gpiochip_set_names(struct gpio_chip *chip); +void devprop_gpiochip_set_names(struct gpio_chip *chip, + const struct fwnode_handle *fwnode); /* With descriptor prefix */ From 907145e68e21947ec76d99827ab4eeeecff4a34c Mon Sep 17 00:00:00 2001 From: "Michael J. Ruhl" Date: Fri, 22 Dec 2017 08:47:20 -0800 Subject: [PATCH 0851/1013] IB/hfi: Only read capability registers if the capability exists commit 4c009af473b2026caaa26107e34d7cc68dad7756 upstream. During driver init, various registers are saved to allow restoration after an FLR or gen3 bump. Some of these registers are not available in some circumstances (i.e. Virtual machines). This bug makes the driver unusable when the PCI device is passed into a VM, it fails during probe. Delete unnecessary register read/write, and only access register if the capability exists. Fixes: a618b7e40af2 ("IB/hfi1: Move saving PCI values to a separate function") Reviewed-by: Mike Marciniszyn Signed-off-by: Michael J. Ruhl Signed-off-by: Dennis Dalessandro Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/hfi1/hfi.h | 1 - drivers/infiniband/hw/hfi1/pcie.c | 30 ++++++++++++------------------ 2 files changed, 12 insertions(+), 19 deletions(-) diff --git a/drivers/infiniband/hw/hfi1/hfi.h b/drivers/infiniband/hw/hfi1/hfi.h index 6ff44dc606eb..3409eee16092 100644 --- a/drivers/infiniband/hw/hfi1/hfi.h +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1129,7 +1129,6 @@ struct hfi1_devdata { u16 pcie_lnkctl; u16 pcie_devctl2; u32 pci_msix0; - u32 pci_lnkctl3; u32 pci_tph2; /* diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c index 09e50fd2a08f..8c7e7a60b715 100644 --- a/drivers/infiniband/hw/hfi1/pcie.c +++ b/drivers/infiniband/hw/hfi1/pcie.c @@ -411,15 +411,12 @@ int restore_pci_variables(struct hfi1_devdata *dd) if (ret) goto error; - ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_SPCIE1, - dd->pci_lnkctl3); - if (ret) - goto error; - - ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2, dd->pci_tph2); - if (ret) - goto error; - + if (pci_find_ext_capability(dd->pcidev, PCI_EXT_CAP_ID_TPH)) { + ret = pci_write_config_dword(dd->pcidev, PCIE_CFG_TPH2, + dd->pci_tph2); + if (ret) + goto error; + } return 0; error: @@ -469,15 +466,12 @@ int save_pci_variables(struct hfi1_devdata *dd) if (ret) goto error; - ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_SPCIE1, - &dd->pci_lnkctl3); - if (ret) - goto error; - - ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2, &dd->pci_tph2); - if (ret) - goto error; - + if (pci_find_ext_capability(dd->pcidev, PCI_EXT_CAP_ID_TPH)) { + ret = pci_read_config_dword(dd->pcidev, PCIE_CFG_TPH2, + &dd->pci_tph2); + if (ret) + goto error; + } return 0; error: From d471542b9f0706b9d37dba40bc02ef27b05a6bc1 Mon Sep 17 00:00:00 2001 From: Majd Dibbiny Date: Sun, 24 Dec 2017 13:54:56 +0200 Subject: [PATCH 0852/1013] IB/mlx5: Serialize access to the VMA list commit ad9a3668a434faca1339789ed2f043d679199309 upstream. User-space applications can do mmap and munmap directly at any time. Since the VMA list is not protected with a mutex, concurrent accesses to the VMA list from the mmap and munmap can cause data corruption. Add a mutex around the list. Fixes: 7c2344c3bbf9 ("IB/mlx5: Implements disassociate_ucontext API") Reviewed-by: Yishai Hadas Signed-off-by: Majd Dibbiny Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/hw/mlx5/main.c | 8 ++++++++ drivers/infiniband/hw/mlx5/mlx5_ib.h | 4 ++++ 2 files changed, 12 insertions(+) diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 5aff1e33d984..30d479f87cb8 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -1415,6 +1415,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct ib_device *ibdev, } INIT_LIST_HEAD(&context->vma_private_list); + mutex_init(&context->vma_private_list_mutex); INIT_LIST_HEAD(&context->db_page_list); mutex_init(&context->db_page_mutex); @@ -1576,7 +1577,9 @@ static void mlx5_ib_vma_close(struct vm_area_struct *area) * mlx5_ib_disassociate_ucontext(). */ mlx5_ib_vma_priv_data->vma = NULL; + mutex_lock(mlx5_ib_vma_priv_data->vma_private_list_mutex); list_del(&mlx5_ib_vma_priv_data->list); + mutex_unlock(mlx5_ib_vma_priv_data->vma_private_list_mutex); kfree(mlx5_ib_vma_priv_data); } @@ -1596,10 +1599,13 @@ static int mlx5_ib_set_vma_data(struct vm_area_struct *vma, return -ENOMEM; vma_prv->vma = vma; + vma_prv->vma_private_list_mutex = &ctx->vma_private_list_mutex; vma->vm_private_data = vma_prv; vma->vm_ops = &mlx5_ib_vm_ops; + mutex_lock(&ctx->vma_private_list_mutex); list_add(&vma_prv->list, vma_head); + mutex_unlock(&ctx->vma_private_list_mutex); return 0; } @@ -1642,6 +1648,7 @@ static void mlx5_ib_disassociate_ucontext(struct ib_ucontext *ibcontext) * mlx5_ib_vma_close. */ down_write(&owning_mm->mmap_sem); + mutex_lock(&context->vma_private_list_mutex); list_for_each_entry_safe(vma_private, n, &context->vma_private_list, list) { vma = vma_private->vma; @@ -1656,6 +1663,7 @@ static void mlx5_ib_disassociate_ucontext(struct ib_ucontext *ibcontext) list_del(&vma_private->list); kfree(vma_private); } + mutex_unlock(&context->vma_private_list_mutex); up_write(&owning_mm->mmap_sem); mmput(owning_mm); put_task_struct(owning_process); diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h b/drivers/infiniband/hw/mlx5/mlx5_ib.h index 189e80cd6b2f..754103372faa 100644 --- a/drivers/infiniband/hw/mlx5/mlx5_ib.h +++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h @@ -115,6 +115,8 @@ enum { struct mlx5_ib_vma_private_data { struct list_head list; struct vm_area_struct *vma; + /* protect vma_private_list add/del */ + struct mutex *vma_private_list_mutex; }; struct mlx5_ib_ucontext { @@ -129,6 +131,8 @@ struct mlx5_ib_ucontext { /* Transport Domain number */ u32 tdn; struct list_head vma_private_list; + /* protect vma_private_list add/del */ + struct mutex vma_private_list_mutex; unsigned long upd_xlt_page; /* protect ODP/KSM */ From 5f3b36984c7b15e9d6ace5fc836b3474a8bb4213 Mon Sep 17 00:00:00 2001 From: Moni Shoua Date: Sun, 24 Dec 2017 13:54:57 +0200 Subject: [PATCH 0853/1013] IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp() commit 05d14e7b0c138cb07ba30e464f47b39434f3fdef upstream. If the input command length is larger than the kernel supports an error should be returned in case the unsupported bytes are not cleared, instead of the other way aroudn. This matches what all other callers of ib_is_udata_cleared do and will avoid user ABI problems in the future. Fixes: 189aba99e700 ("IB/uverbs: Extend modify_qp and support packet pacing") Reviewed-by: Yishai Hadas Signed-off-by: Moni Shoua Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/uverbs_cmd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index d8f540054392..93c1a57dbff1 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -2085,8 +2085,8 @@ int ib_uverbs_ex_modify_qp(struct ib_uverbs_file *file, return -EOPNOTSUPP; if (ucore->inlen > sizeof(cmd)) { - if (ib_is_udata_cleared(ucore, sizeof(cmd), - ucore->inlen - sizeof(cmd))) + if (!ib_is_udata_cleared(ucore, sizeof(cmd), + ucore->inlen - sizeof(cmd))) return -EOPNOTSUPP; } From af0dc162f644335c50e2384ba485e4aa79b25e28 Mon Sep 17 00:00:00 2001 From: Moni Shoua Date: Sun, 24 Dec 2017 13:54:58 +0200 Subject: [PATCH 0854/1013] IB/core: Verify that QP is security enabled in create and destroy commit 4a50881bbac309e6f0684816a180bc3c14e1485d upstream. The XRC target QP create flow sets up qp_sec only if there is an IB link with LSM security enabled. However, several other related uAPI entry points blindly follow the qp_sec NULL pointer, resulting in a possible oops. Check for NULL before using qp_sec. Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Reviewed-by: Daniel Jurgens Signed-off-by: Moni Shoua Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe Signed-off-by: Greg Kroah-Hartman --- drivers/infiniband/core/security.c | 3 +++ drivers/infiniband/core/verbs.c | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/security.c b/drivers/infiniband/core/security.c index feafdb961c48..59b2f96d986a 100644 --- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -386,6 +386,9 @@ int ib_open_shared_qp_security(struct ib_qp *qp, struct ib_device *dev) if (ret) return ret; + if (!qp->qp_sec) + return 0; + mutex_lock(&real_qp->qp_sec->mutex); ret = check_qp_port_pkey_settings(real_qp->qp_sec->ports_pkeys, qp->qp_sec); diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index de57d6c11a25..9032f77cc38d 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -1400,7 +1400,8 @@ int ib_close_qp(struct ib_qp *qp) spin_unlock_irqrestore(&real_qp->device->event_handler_lock, flags); atomic_dec(&real_qp->usecnt); - ib_close_shared_qp_security(qp->qp_sec); + if (qp->qp_sec) + ib_close_shared_qp_security(qp->qp_sec); kfree(qp); return 0; From 056305595a9940005ff34d0a67e2a2b231d162b7 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Fri, 22 Dec 2017 10:45:07 +0100 Subject: [PATCH 0855/1013] ALSA: hda: Drop useless WARN_ON() commit a36c2638380c0a4676647a1f553b70b20d3ebce1 upstream. Since the commit 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path") cleared hdac_acomp pointer, the WARN_ON() non-NULL check in snd_hdac_i915_register_notifier() may give a false-positive warning, as the function gets called no matter whether the component is registered or not. For fixing it, let's get rid of the spurious WARN_ON(). Fixes: 97cc2ed27e5a ("ALSA: hda - Fix yet another i915 pointer leftover in error path") Reported-by: Kouta Okamoto Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/hda/hdac_i915.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sound/hda/hdac_i915.c b/sound/hda/hdac_i915.c index 038a180d3f81..cbe818eda336 100644 --- a/sound/hda/hdac_i915.c +++ b/sound/hda/hdac_i915.c @@ -325,7 +325,7 @@ static int hdac_component_master_match(struct device *dev, void *data) */ int snd_hdac_i915_register_notifier(const struct i915_audio_component_audio_ops *aops) { - if (WARN_ON(!hdac_acomp)) + if (!hdac_acomp) return -ENODEV; hdac_acomp->audio_ops = aops; From 2845bbd1ef1f574ce2bbd7a7f6feab04ca137cf3 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Fri, 22 Dec 2017 11:17:44 +0800 Subject: [PATCH 0856/1013] ALSA: hda - Add MIC_NO_PRESENCE fixup for 2 HP machines commit 322f74ede933b3e2cb78768b6a6fdbfbf478a0c1 upstream. There is a headset jack on the front panel, when we plug a headset into it, the headset mic can't trigger unsol events, and read_pin_sense() can't detect its presence too. So add this fixup to fix this issue. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_conexant.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/sound/pci/hda/patch_conexant.c b/sound/pci/hda/patch_conexant.c index a81aacf684b2..37e1cf8218ff 100644 --- a/sound/pci/hda/patch_conexant.c +++ b/sound/pci/hda/patch_conexant.c @@ -271,6 +271,8 @@ enum { CXT_FIXUP_HP_SPECTRE, CXT_FIXUP_HP_GATE_MIC, CXT_FIXUP_MUTE_LED_GPIO, + CXT_FIXUP_HEADSET_MIC, + CXT_FIXUP_HP_MIC_NO_PRESENCE, }; /* for hda_fixup_thinkpad_acpi() */ @@ -350,6 +352,18 @@ static void cxt_fixup_headphone_mic(struct hda_codec *codec, } } +static void cxt_fixup_headset_mic(struct hda_codec *codec, + const struct hda_fixup *fix, int action) +{ + struct conexant_spec *spec = codec->spec; + + switch (action) { + case HDA_FIXUP_ACT_PRE_PROBE: + spec->parse_flags |= HDA_PINCFG_HEADSET_MIC; + break; + } +} + /* OPLC XO 1.5 fixup */ /* OLPC XO-1.5 supports DC input mode (e.g. for use with analog sensors) @@ -880,6 +894,19 @@ static const struct hda_fixup cxt_fixups[] = { .type = HDA_FIXUP_FUNC, .v.func = cxt_fixup_mute_led_gpio, }, + [CXT_FIXUP_HEADSET_MIC] = { + .type = HDA_FIXUP_FUNC, + .v.func = cxt_fixup_headset_mic, + }, + [CXT_FIXUP_HP_MIC_NO_PRESENCE] = { + .type = HDA_FIXUP_PINS, + .v.pins = (const struct hda_pintbl[]) { + { 0x1a, 0x02a1113c }, + { } + }, + .chained = true, + .chain_id = CXT_FIXUP_HEADSET_MIC, + }, }; static const struct snd_pci_quirk cxt5045_fixups[] = { @@ -934,6 +961,8 @@ static const struct snd_pci_quirk cxt5066_fixups[] = { SND_PCI_QUIRK(0x103c, 0x8115, "HP Z1 Gen3", CXT_FIXUP_HP_GATE_MIC), SND_PCI_QUIRK(0x103c, 0x814f, "HP ZBook 15u G3", CXT_FIXUP_MUTE_LED_GPIO), SND_PCI_QUIRK(0x103c, 0x822e, "HP ProBook 440 G4", CXT_FIXUP_MUTE_LED_GPIO), + SND_PCI_QUIRK(0x103c, 0x8299, "HP 800 G3 SFF", CXT_FIXUP_HP_MIC_NO_PRESENCE), + SND_PCI_QUIRK(0x103c, 0x829a, "HP 800 G3 DM", CXT_FIXUP_HP_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1043, 0x138d, "Asus", CXT_FIXUP_HEADPHONE_MIC_PIN), SND_PCI_QUIRK(0x152d, 0x0833, "OLPC XO-1.5", CXT_FIXUP_OLPC_XO), SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400", CXT_PINCFG_LENOVO_TP410), From 3d858b85e376b08db9c9de42aa4e0b9b47fac466 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Fri, 22 Dec 2017 11:17:46 +0800 Subject: [PATCH 0857/1013] ALSA: hda - change the location for one mic on a Lenovo machine commit 8da5bbfc7cbba909f4f32d5e1dda3750baa5d853 upstream. There are two front mics on this machine, and current driver assign the same name Mic to both of them, but pulseaudio can't handle them. As a workaround, we change the location for one of them, then the driver will assign "Front Mic" and "Mic" for them. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 9ac4b9076ee2..b32ea9d66f63 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6305,6 +6305,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x17aa, 0x30bb, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY), SND_PCI_QUIRK(0x17aa, 0x30e2, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY), SND_PCI_QUIRK(0x17aa, 0x310c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION), + SND_PCI_QUIRK(0x17aa, 0x313c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION), SND_PCI_QUIRK(0x17aa, 0x3112, "ThinkCentre AIO", ALC233_FIXUP_LENOVO_LINE2_MIC_HOTKEY), SND_PCI_QUIRK(0x17aa, 0x3902, "Lenovo E50-80", ALC269_FIXUP_DMIC_THINKPAD_ACPI), SND_PCI_QUIRK(0x17aa, 0x3977, "IdeaPad S210", ALC283_FIXUP_INT_MIC), From 9fcd2ae2abb5efb67f331f567ed22a7305461a80 Mon Sep 17 00:00:00 2001 From: Hui Wang Date: Fri, 22 Dec 2017 11:17:45 +0800 Subject: [PATCH 0858/1013] ALSA: hda - fix headset mic detection issue on a Dell machine commit 285d5ddcffafa5d5e68c586f4c9eaa8b24a2897d upstream. It has the codec alc256, and add its pin definition to pin quirk table to let it apply ALC255_FIXUP_DELL1_MIC_NO_PRESENCE. Signed-off-by: Hui Wang Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index b32ea9d66f63..7ccc965ffdd6 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6558,6 +6558,11 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = { SND_HDA_PIN_QUIRK(0x10ec0255, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, {0x1b, 0x01011020}, {0x21, 0x02211010}), + SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + {0x12, 0x90a60130}, + {0x14, 0x90170110}, + {0x1b, 0x01011020}, + {0x21, 0x0221101f}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, {0x12, 0x90a60160}, {0x14, 0x90170120}, From 9c5ee053a67eb374878b20eab7d16feb2a87a851 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Wed, 27 Dec 2017 08:53:59 +0100 Subject: [PATCH 0859/1013] ALSA: hda - Fix missing COEF init for ALC225/295/299 commit 44be77c590f381bc629815ac789b8b15ecc4ddcf upstream. There was a long-standing problem on HP Spectre X360 with Kabylake where it lacks of the front speaker output in some situations. Also there are other products showing the similar behavior. The culprit seems to be the missing COEF setup on ALC codecs, ALC225/295/299, which are all compatible. This patch adds the proper COEF setup (to initialize idx 0x67 / bits 0x3000) for addressing the issue. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195457 Signed-off-by: Takashi Iwai Signed-off-by: Greg Kroah-Hartman --- sound/pci/hda/patch_realtek.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 7ccc965ffdd6..acdb196ddb44 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -324,8 +324,12 @@ static void alc_fill_eapd_coef(struct hda_codec *codec) case 0x10ec0292: alc_update_coef_idx(codec, 0x4, 1<<15, 0); break; - case 0x10ec0215: case 0x10ec0225: + case 0x10ec0295: + case 0x10ec0299: + alc_update_coef_idx(codec, 0x67, 0xf000, 0x3000); + /* fallthrough */ + case 0x10ec0215: case 0x10ec0233: case 0x10ec0236: case 0x10ec0255: @@ -336,10 +340,8 @@ static void alc_fill_eapd_coef(struct hda_codec *codec) case 0x10ec0286: case 0x10ec0288: case 0x10ec0285: - case 0x10ec0295: case 0x10ec0298: case 0x10ec0289: - case 0x10ec0299: alc_update_coef_idx(codec, 0x10, 1<<9, 0); break; case 0x10ec0275: From 0c688c288f8e82fdb5f3e0d0f3b2b1f3498ad187 Mon Sep 17 00:00:00 2001 From: Joel Fernandes Date: Thu, 21 Dec 2017 02:22:45 +0100 Subject: [PATCH 0860/1013] cpufreq: schedutil: Use idle_calls counter of the remote CPU commit 466a2b42d67644447a1765276259a3ea5531ddff upstream. Since the recent remote cpufreq callback work, its possible that a cpufreq update is triggered from a remote CPU. For single policies however, the current code uses the local CPU when trying to determine if the remote sg_cpu entered idle or is busy. This is incorrect. To remedy this, compare with the nohz tick idle_calls counter of the remote CPU. Fixes: 674e75411fc2 (sched: cpufreq: Allow remote cpufreq callbacks) Acked-by: Viresh Kumar Acked-by: Peter Zijlstra (Intel) Signed-off-by: Joel Fernandes Signed-off-by: Rafael J. Wysocki Signed-off-by: Greg Kroah-Hartman --- include/linux/tick.h | 1 + kernel/sched/cpufreq_schedutil.c | 2 +- kernel/time/tick-sched.c | 13 +++++++++++++ 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/include/linux/tick.h b/include/linux/tick.h index cf413b344ddb..5cdac11dd317 100644 --- a/include/linux/tick.h +++ b/include/linux/tick.h @@ -119,6 +119,7 @@ extern void tick_nohz_idle_exit(void); extern void tick_nohz_irq_exit(void); extern ktime_t tick_nohz_get_sleep_length(void); extern unsigned long tick_nohz_get_idle_calls(void); +extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu); extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time); extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time); #else /* !CONFIG_NO_HZ_COMMON */ diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 2f52ec0f1539..d6717a3331a1 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -244,7 +244,7 @@ static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util, #ifdef CONFIG_NO_HZ_COMMON static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { - unsigned long idle_calls = tick_nohz_get_idle_calls(); + unsigned long idle_calls = tick_nohz_get_idle_calls_cpu(sg_cpu->cpu); bool ret = idle_calls == sg_cpu->saved_idle_calls; sg_cpu->saved_idle_calls = idle_calls; diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index c7a899c5ce64..e4d463adefc2 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1009,6 +1009,19 @@ ktime_t tick_nohz_get_sleep_length(void) return ts->sleep_length; } +/** + * tick_nohz_get_idle_calls_cpu - return the current idle calls counter value + * for a particular CPU. + * + * Called from the schedutil frequency scaling governor in scheduler context. + */ +unsigned long tick_nohz_get_idle_calls_cpu(int cpu) +{ + struct tick_sched *ts = tick_get_tick_sched(cpu); + + return ts->idle_calls; +} + /** * tick_nohz_get_idle_calls - return the current idle calls counter value * From 88da02868f7741cc7c1d895840e236e6df5483ef Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Mon, 18 Dec 2017 15:40:44 +0800 Subject: [PATCH 0861/1013] block: fix blk_rq_append_bio commit 0abc2a10389f0c9070f76ca906c7382788036b93 upstream. Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio) moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider the fact that the bounced bio becomes invisible to caller since the parameter type is 'struct bio *'. Make it a pointer to a pointer to a bio, so the caller sees the right bio also after a bounce. Fixes: caa4b02476e3 ("blk-map: call blk_queue_bounce from blk_rq_append_bio") Cc: Christoph Hellwig Reported-by: Michele Ballabio (handling failure of blk_rq_append_bio(), only call bio_get() after blk_rq_append_bio() returns OK) Tested-by: Michele Ballabio Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- block/blk-map.c | 38 +++++++++++++++++------------- drivers/scsi/osd/osd_initiator.c | 4 +++- drivers/target/target_core_pscsi.c | 4 ++-- include/linux/blkdev.h | 2 +- 4 files changed, 28 insertions(+), 20 deletions(-) diff --git a/block/blk-map.c b/block/blk-map.c index d5251edcc0dd..368daa02714e 100644 --- a/block/blk-map.c +++ b/block/blk-map.c @@ -12,22 +12,29 @@ #include "blk.h" /* - * Append a bio to a passthrough request. Only works can be merged into - * the request based on the driver constraints. + * Append a bio to a passthrough request. Only works if the bio can be merged + * into the request based on the driver constraints. */ -int blk_rq_append_bio(struct request *rq, struct bio *bio) +int blk_rq_append_bio(struct request *rq, struct bio **bio) { - blk_queue_bounce(rq->q, &bio); + struct bio *orig_bio = *bio; + + blk_queue_bounce(rq->q, bio); if (!rq->bio) { - blk_rq_bio_prep(rq->q, rq, bio); + blk_rq_bio_prep(rq->q, rq, *bio); } else { - if (!ll_back_merge_fn(rq->q, rq, bio)) + if (!ll_back_merge_fn(rq->q, rq, *bio)) { + if (orig_bio != *bio) { + bio_put(*bio); + *bio = orig_bio; + } return -EINVAL; + } - rq->biotail->bi_next = bio; - rq->biotail = bio; - rq->__data_len += bio->bi_iter.bi_size; + rq->biotail->bi_next = *bio; + rq->biotail = *bio; + rq->__data_len += (*bio)->bi_iter.bi_size; } return 0; @@ -80,14 +87,12 @@ static int __blk_rq_map_user_iov(struct request *rq, * We link the bounce buffer in and could have to traverse it * later so we have to get a ref to prevent it from being freed */ - ret = blk_rq_append_bio(rq, bio); - bio_get(bio); + ret = blk_rq_append_bio(rq, &bio); if (ret) { - bio_endio(bio); __blk_rq_unmap_user(orig_bio); - bio_put(bio); return ret; } + bio_get(bio); return 0; } @@ -220,7 +225,7 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf, int reading = rq_data_dir(rq) == READ; unsigned long addr = (unsigned long) kbuf; int do_copy = 0; - struct bio *bio; + struct bio *bio, *orig_bio; int ret; if (len > (queue_max_hw_sectors(q) << 9)) @@ -243,10 +248,11 @@ int blk_rq_map_kern(struct request_queue *q, struct request *rq, void *kbuf, if (do_copy) rq->rq_flags |= RQF_COPY_USER; - ret = blk_rq_append_bio(rq, bio); + orig_bio = bio; + ret = blk_rq_append_bio(rq, &bio); if (unlikely(ret)) { /* request is too big */ - bio_put(bio); + bio_put(orig_bio); return ret; } diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c index a4f28b7e4c65..e18877177f1b 100644 --- a/drivers/scsi/osd/osd_initiator.c +++ b/drivers/scsi/osd/osd_initiator.c @@ -1576,7 +1576,9 @@ static struct request *_make_request(struct request_queue *q, bool has_write, return req; for_each_bio(bio) { - ret = blk_rq_append_bio(req, bio); + struct bio *bounce_bio = bio; + + ret = blk_rq_append_bio(req, &bounce_bio); if (ret) return ERR_PTR(ret); } diff --git a/drivers/target/target_core_pscsi.c b/drivers/target/target_core_pscsi.c index 7c69b4a9694d..0d99b242e82e 100644 --- a/drivers/target/target_core_pscsi.c +++ b/drivers/target/target_core_pscsi.c @@ -920,7 +920,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, " %d i: %d bio: %p, allocating another" " bio\n", bio->bi_vcnt, i, bio); - rc = blk_rq_append_bio(req, bio); + rc = blk_rq_append_bio(req, &bio); if (rc) { pr_err("pSCSI: failed to append bio\n"); goto fail; @@ -938,7 +938,7 @@ pscsi_map_sg(struct se_cmd *cmd, struct scatterlist *sgl, u32 sgl_nents, } if (bio) { - rc = blk_rq_append_bio(req, bio); + rc = blk_rq_append_bio(req, &bio); if (rc) { pr_err("pSCSI: failed to append bio\n"); goto fail; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index fd47bd96b5d3..1329dca3a914 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -952,7 +952,7 @@ extern int blk_rq_prep_clone(struct request *rq, struct request *rq_src, extern void blk_rq_unprep_clone(struct request *rq); extern blk_status_t blk_insert_cloned_request(struct request_queue *q, struct request *rq); -extern int blk_rq_append_bio(struct request *rq, struct bio *bio); +extern int blk_rq_append_bio(struct request *rq, struct bio **bio); extern void blk_delay_queue(struct request_queue *, unsigned long); extern void blk_queue_split(struct request_queue *, struct bio **); extern void blk_recount_segments(struct request_queue *, struct bio *); From eaedee932c91120868d42df39a1fc17fa6a313ef Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Mon, 18 Dec 2017 15:40:43 +0800 Subject: [PATCH 0862/1013] block: don't let passthrough IO go into .make_request_fn() commit 14cb0dc6479dc5ebc63b3a459a5d89a2f1b39fed upstream. Commit a8821f3f3("block: Improvements to bounce-buffer handling") tries to make sure that the bio to .make_request_fn won't exceed BIO_MAX_PAGES, but ignores that passthrough I/O can use blk_queue_bounce() too. Especially, passthrough IO may not be sector-aligned, and the check of 'sectors < bio_sectors(*bio_orig)' inside __blk_queue_bounce() may become true even though the max bvec number doesn't exceed BIO_MAX_PAGES, then cause the bio splitted, and the original passthrough bio is submited to generic_make_request(). This patch fixes this issue by checking if the bio is passthrough IO, and use bio_kmalloc() to allocate the cloned passthrough bio. Cc: NeilBrown Fixes: a8821f3f3("block: Improvements to bounce-buffer handling") Tested-by: Michele Ballabio Signed-off-by: Ming Lei Signed-off-by: Jens Axboe Signed-off-by: Greg Kroah-Hartman --- block/bounce.c | 6 ++++-- include/linux/blkdev.h | 21 +++++++++++++++++++-- 2 files changed, 23 insertions(+), 4 deletions(-) diff --git a/block/bounce.c b/block/bounce.c index fceb1a96480b..1d05c422c932 100644 --- a/block/bounce.c +++ b/block/bounce.c @@ -200,6 +200,7 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig, unsigned i = 0; bool bounce = false; int sectors = 0; + bool passthrough = bio_is_passthrough(*bio_orig); bio_for_each_segment(from, *bio_orig, iter) { if (i++ < BIO_MAX_PAGES) @@ -210,13 +211,14 @@ static void __blk_queue_bounce(struct request_queue *q, struct bio **bio_orig, if (!bounce) return; - if (sectors < bio_sectors(*bio_orig)) { + if (!passthrough && sectors < bio_sectors(*bio_orig)) { bio = bio_split(*bio_orig, sectors, GFP_NOIO, bounce_bio_split); bio_chain(bio, *bio_orig); generic_make_request(*bio_orig); *bio_orig = bio; } - bio = bio_clone_bioset(*bio_orig, GFP_NOIO, bounce_bio_set); + bio = bio_clone_bioset(*bio_orig, GFP_NOIO, passthrough ? NULL : + bounce_bio_set); bio_for_each_segment_all(to, bio, i) { struct page *page = to->bv_page; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 1329dca3a914..6362e3606aa5 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -241,14 +241,24 @@ struct request { struct request *next_rq; }; +static inline bool blk_op_is_scsi(unsigned int op) +{ + return op == REQ_OP_SCSI_IN || op == REQ_OP_SCSI_OUT; +} + +static inline bool blk_op_is_private(unsigned int op) +{ + return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT; +} + static inline bool blk_rq_is_scsi(struct request *rq) { - return req_op(rq) == REQ_OP_SCSI_IN || req_op(rq) == REQ_OP_SCSI_OUT; + return blk_op_is_scsi(req_op(rq)); } static inline bool blk_rq_is_private(struct request *rq) { - return req_op(rq) == REQ_OP_DRV_IN || req_op(rq) == REQ_OP_DRV_OUT; + return blk_op_is_private(req_op(rq)); } static inline bool blk_rq_is_passthrough(struct request *rq) @@ -256,6 +266,13 @@ static inline bool blk_rq_is_passthrough(struct request *rq) return blk_rq_is_scsi(rq) || blk_rq_is_private(rq); } +static inline bool bio_is_passthrough(struct bio *bio) +{ + unsigned op = bio_op(bio); + + return blk_op_is_scsi(op) || blk_op_is_private(op); +} + static inline unsigned short req_get_ioprio(struct request *req) { return req->ioprio; From aa7f9011bc01b22658d4c255e2feecf0193e092d Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 29 Dec 2017 17:34:43 -0800 Subject: [PATCH 0863/1013] kbuild: add '-fno-stack-check' to kernel build options MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3ce120b16cc548472f80cf8644f90eda958cf1b6 upstream. It appears that hardened gentoo enables "-fstack-check" by default for gcc. That doesn't work _at_all_ for the kernel, because the kernel stack doesn't act like a user stack at all: it's much smaller, and it doesn't auto-expand on use. So the extra "probe one page below the stack" code generated by -fstack-check just breaks the kernel in horrible ways, causing infinite double faults etc. [ I have to say, that the particular code gcc generates looks very stupid even for user space where it works, but that's a separate issue. ] Reported-and-tested-by: Alexander Tsoy Reported-and-tested-by: Toralf Förster Cc: Dave Hansen Cc: Jiri Kosina Cc: Andy Lutomirski Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- Makefile | 3 +++ 1 file changed, 3 insertions(+) diff --git a/Makefile b/Makefile index 9edfb78836a9..930fa93f397d 100644 --- a/Makefile +++ b/Makefile @@ -802,6 +802,9 @@ KBUILD_CFLAGS += $(call cc-disable-warning, pointer-sign) # disable invalid "can't wrap" optimizations for signed / pointers KBUILD_CFLAGS += $(call cc-option,-fno-strict-overflow) +# Make sure -fstack-check isn't enabled (like gentoo apparently did) +KBUILD_CFLAGS += $(call cc-option,-fno-stack-check,) + # conserve stack if available KBUILD_CFLAGS += $(call cc-option,-fconserve-stack) From 57dfc3d10e40053e70691260a791bb453f183cfd Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 11 Dec 2017 07:17:39 -0800 Subject: [PATCH 0864/1013] ipv4: igmp: guard against silly MTU values MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit b5476022bbada3764609368f03329ca287528dc8 ] IPv4 stack reacts to changes to small MTU, by disabling itself under RTNL. But there is a window where threads not using RTNL can see a wrong device mtu. This can lead to surprises, in igmp code where it is assumed the mtu is suitable. Fix this by reading device mtu once and checking IPv4 minimal MTU. This patch adds missing IPV4_MIN_MTU define, to not abuse ETH_MIN_MTU anymore. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/net/ip.h | 1 + net/ipv4/devinet.c | 2 +- net/ipv4/igmp.c | 24 +++++++++++++++--------- net/ipv4/ip_tunnel.c | 4 ++-- 4 files changed, 19 insertions(+), 12 deletions(-) diff --git a/include/net/ip.h b/include/net/ip.h index 9896f46cbbf1..af8addbaa3c1 100644 --- a/include/net/ip.h +++ b/include/net/ip.h @@ -34,6 +34,7 @@ #include #define IPV4_MAX_PMTU 65535U /* RFC 2675, Section 5.1 */ +#define IPV4_MIN_MTU 68 /* RFC 791 */ struct sock; diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index d7adc0616599..bffa88ecc534 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1420,7 +1420,7 @@ static void inetdev_changename(struct net_device *dev, struct in_device *in_dev) static bool inetdev_valid_mtu(unsigned int mtu) { - return mtu >= 68; + return mtu >= IPV4_MIN_MTU; } static void inetdev_send_gratuitous_arp(struct net_device *dev, diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index ab183af0b5b6..86f1786fa94b 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -404,16 +404,17 @@ static int grec_size(struct ip_mc_list *pmc, int type, int gdel, int sdel) } static struct sk_buff *add_grhead(struct sk_buff *skb, struct ip_mc_list *pmc, - int type, struct igmpv3_grec **ppgr) + int type, struct igmpv3_grec **ppgr, unsigned int mtu) { struct net_device *dev = pmc->interface->dev; struct igmpv3_report *pih; struct igmpv3_grec *pgr; - if (!skb) - skb = igmpv3_newpack(dev, dev->mtu); - if (!skb) - return NULL; + if (!skb) { + skb = igmpv3_newpack(dev, mtu); + if (!skb) + return NULL; + } pgr = skb_put(skb, sizeof(struct igmpv3_grec)); pgr->grec_type = type; pgr->grec_auxwords = 0; @@ -436,12 +437,17 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ip_mc_list *pmc, struct igmpv3_grec *pgr = NULL; struct ip_sf_list *psf, *psf_next, *psf_prev, **psf_list; int scount, stotal, first, isquery, truncate; + unsigned int mtu; if (pmc->multiaddr == IGMP_ALL_HOSTS) return skb; if (ipv4_is_local_multicast(pmc->multiaddr) && !net->ipv4.sysctl_igmp_llm_reports) return skb; + mtu = READ_ONCE(dev->mtu); + if (mtu < IPV4_MIN_MTU) + return skb; + isquery = type == IGMPV3_MODE_IS_INCLUDE || type == IGMPV3_MODE_IS_EXCLUDE; truncate = type == IGMPV3_MODE_IS_EXCLUDE || @@ -462,7 +468,7 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ip_mc_list *pmc, AVAILABLE(skb) < grec_size(pmc, type, gdeleted, sdeleted)) { if (skb) igmpv3_sendpack(skb); - skb = igmpv3_newpack(dev, dev->mtu); + skb = igmpv3_newpack(dev, mtu); } } first = 1; @@ -498,12 +504,12 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ip_mc_list *pmc, pgr->grec_nsrcs = htons(scount); if (skb) igmpv3_sendpack(skb); - skb = igmpv3_newpack(dev, dev->mtu); + skb = igmpv3_newpack(dev, mtu); first = 1; scount = 0; } if (first) { - skb = add_grhead(skb, pmc, type, &pgr); + skb = add_grhead(skb, pmc, type, &pgr, mtu); first = 0; } if (!skb) @@ -538,7 +544,7 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ip_mc_list *pmc, igmpv3_sendpack(skb); skb = NULL; /* add_grhead will get a new one */ } - skb = add_grhead(skb, pmc, type, &pgr); + skb = add_grhead(skb, pmc, type, &pgr, mtu); } } if (pgr) diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c index e9805ad664ac..4e90082b23a6 100644 --- a/net/ipv4/ip_tunnel.c +++ b/net/ipv4/ip_tunnel.c @@ -349,8 +349,8 @@ static int ip_tunnel_bind_dev(struct net_device *dev) dev->needed_headroom = t_hlen + hlen; mtu -= (dev->hard_header_len + t_hlen); - if (mtu < 68) - mtu = 68; + if (mtu < IPV4_MIN_MTU) + mtu = IPV4_MIN_MTU; return mtu; } From 521f4d9625b7b2359dbec6039724b4feb9e8490d Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 11 Dec 2017 07:03:38 -0800 Subject: [PATCH 0865/1013] ipv6: mcast: better catch silly mtu values [ Upstream commit b9b312a7a451e9c098921856e7cfbc201120e1a7 ] syzkaller reported crashes in IPv6 stack [1] Xin Long found that lo MTU was set to silly values. IPv6 stack reacts to changes to small MTU, by disabling itself under RTNL. But there is a window where threads not using RTNL can see a wrong device mtu. This can lead to surprises, in mld code where it is assumed the mtu is suitable. Fix this by reading device mtu once and checking IPv6 minimal MTU. [1] skbuff: skb_over_panic: text:0000000010b86b8d len:196 put:20 head:000000003b477e60 data:000000000e85441e tail:0xd4 end:0xc0 dev:lo ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:104! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.15.0-rc2-mm1+ #39 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_panic+0x15c/0x1f0 net/core/skbuff.c:100 RSP: 0018:ffff8801db307508 EFLAGS: 00010286 RAX: 0000000000000082 RBX: ffff8801c517e840 RCX: 0000000000000000 RDX: 0000000000000082 RSI: 1ffff1003b660e61 RDI: ffffed003b660e95 RBP: ffff8801db307570 R08: 1ffff1003b660e23 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff85bd4020 R13: ffffffff84754ed2 R14: 0000000000000014 R15: ffff8801c4e26540 FS: 0000000000000000(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000463610 CR3: 00000001c6698000 CR4: 00000000001406e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: skb_over_panic net/core/skbuff.c:109 [inline] skb_put+0x181/0x1c0 net/core/skbuff.c:1694 add_grhead.isra.24+0x42/0x3b0 net/ipv6/mcast.c:1695 add_grec+0xa55/0x1060 net/ipv6/mcast.c:1817 mld_send_cr net/ipv6/mcast.c:1903 [inline] mld_ifc_timer_expire+0x4d2/0x770 net/ipv6/mcast.c:2448 call_timer_fn+0x23b/0x840 kernel/time/timer.c:1320 expire_timers kernel/time/timer.c:1357 [inline] __run_timers+0x7e1/0xb60 kernel/time/timer.c:1660 run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686 __do_softirq+0x29d/0xbb2 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d3/0x210 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:540 [inline] smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:920 Signed-off-by: Eric Dumazet Reported-by: syzbot Tested-by: Xin Long Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/mcast.c | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) diff --git a/net/ipv6/mcast.c b/net/ipv6/mcast.c index 12b7c27ce5ce..9a38a2c641fa 100644 --- a/net/ipv6/mcast.c +++ b/net/ipv6/mcast.c @@ -1682,16 +1682,16 @@ static int grec_size(struct ifmcaddr6 *pmc, int type, int gdel, int sdel) } static struct sk_buff *add_grhead(struct sk_buff *skb, struct ifmcaddr6 *pmc, - int type, struct mld2_grec **ppgr) + int type, struct mld2_grec **ppgr, unsigned int mtu) { - struct net_device *dev = pmc->idev->dev; struct mld2_report *pmr; struct mld2_grec *pgr; - if (!skb) - skb = mld_newpack(pmc->idev, dev->mtu); - if (!skb) - return NULL; + if (!skb) { + skb = mld_newpack(pmc->idev, mtu); + if (!skb) + return NULL; + } pgr = skb_put(skb, sizeof(struct mld2_grec)); pgr->grec_type = type; pgr->grec_auxwords = 0; @@ -1714,10 +1714,15 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ifmcaddr6 *pmc, struct mld2_grec *pgr = NULL; struct ip6_sf_list *psf, *psf_next, *psf_prev, **psf_list; int scount, stotal, first, isquery, truncate; + unsigned int mtu; if (pmc->mca_flags & MAF_NOREPORT) return skb; + mtu = READ_ONCE(dev->mtu); + if (mtu < IPV6_MIN_MTU) + return skb; + isquery = type == MLD2_MODE_IS_INCLUDE || type == MLD2_MODE_IS_EXCLUDE; truncate = type == MLD2_MODE_IS_EXCLUDE || @@ -1738,7 +1743,7 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ifmcaddr6 *pmc, AVAILABLE(skb) < grec_size(pmc, type, gdeleted, sdeleted)) { if (skb) mld_sendpack(skb); - skb = mld_newpack(idev, dev->mtu); + skb = mld_newpack(idev, mtu); } } first = 1; @@ -1774,12 +1779,12 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ifmcaddr6 *pmc, pgr->grec_nsrcs = htons(scount); if (skb) mld_sendpack(skb); - skb = mld_newpack(idev, dev->mtu); + skb = mld_newpack(idev, mtu); first = 1; scount = 0; } if (first) { - skb = add_grhead(skb, pmc, type, &pgr); + skb = add_grhead(skb, pmc, type, &pgr, mtu); first = 0; } if (!skb) @@ -1814,7 +1819,7 @@ static struct sk_buff *add_grec(struct sk_buff *skb, struct ifmcaddr6 *pmc, mld_sendpack(skb); skb = NULL; /* add_grhead will get a new one */ } - skb = add_grhead(skb, pmc, type, &pgr); + skb = add_grhead(skb, pmc, type, &pgr, mtu); } } if (pgr) From f55ac6684640a285465cf31a8d5e23bcc3da872d Mon Sep 17 00:00:00 2001 From: Fugang Duan Date: Fri, 22 Dec 2017 17:12:09 +0800 Subject: [PATCH 0866/1013] net: fec: unmap the xmit buffer that are not transferred by DMA [ Upstream commit 178e5f57a8d8f8fc5799a624b96fc31ef9a29ffa ] The enet IP only support 32 bit, it will use swiotlb buffer to do dma mapping when xmit buffer DMA memory address is bigger than 4G in i.MX platform. After stress suspend/resume test, it will print out: log: [12826.352864] fec 5b040000.ethernet: swiotlb buffer is full (sz: 191 bytes) [12826.359676] DMA: Out of SW-IOMMU space for 191 bytes at device 5b040000.ethernet [12826.367110] fec 5b040000.ethernet eth0: Tx DMA memory map failed The issue is that the ready xmit buffers that are dma mapped but DMA still don't copy them into fifo, once MAC restart, these DMA buffers are not unmapped. So it should check the dma mapping buffer and unmap them. Signed-off-by: Fugang Duan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/freescale/fec_main.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 3dc2d771a222..faf7cdc97ebf 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -818,6 +818,12 @@ static void fec_enet_bd_init(struct net_device *dev) for (i = 0; i < txq->bd.ring_size; i++) { /* Initialize the BD for every fragment in the page. */ bdp->cbd_sc = cpu_to_fec16(0); + if (bdp->cbd_bufaddr && + !IS_TSO_HEADER(txq, fec32_to_cpu(bdp->cbd_bufaddr))) + dma_unmap_single(&fep->pdev->dev, + fec32_to_cpu(bdp->cbd_bufaddr), + fec16_to_cpu(bdp->cbd_datlen), + DMA_TO_DEVICE); if (txq->tx_skbuff[i]) { dev_kfree_skb_any(txq->tx_skbuff[i]); txq->tx_skbuff[i] = NULL; From f9c48469278060450e99c3995c5b0717c48d50b8 Mon Sep 17 00:00:00 2001 From: Kevin Cernekee Date: Mon, 11 Dec 2017 11:13:45 -0800 Subject: [PATCH 0867/1013] net: igmp: Use correct source address on IGMPv3 reports [ Upstream commit a46182b00290839fa3fa159d54fd3237bd8669f0 ] Closing a multicast socket after the final IPv4 address is deleted from an interface can generate a membership report that uses the source IP from a different interface. The following test script, run from an isolated netns, reproduces the issue: #!/bin/bash ip link add dummy0 type dummy ip link add dummy1 type dummy ip link set dummy0 up ip link set dummy1 up ip addr add 10.1.1.1/24 dev dummy0 ip addr add 192.168.99.99/24 dev dummy1 tcpdump -U -i dummy0 & socat EXEC:"sleep 2" \ UDP4-DATAGRAM:239.101.1.68:8889,ip-add-membership=239.0.1.68:10.1.1.1 & sleep 1 ip addr del 10.1.1.1/24 dev dummy0 sleep 5 kill %tcpdump RFC 3376 specifies that the report must be sent with a valid IP source address from the destination subnet, or from address 0.0.0.0. Add an extra check to make sure this is the case. Signed-off-by: Kevin Cernekee Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/igmp.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/net/ipv4/igmp.c b/net/ipv4/igmp.c index 86f1786fa94b..c621266e0306 100644 --- a/net/ipv4/igmp.c +++ b/net/ipv4/igmp.c @@ -89,6 +89,7 @@ #include #include #include +#include #include #include @@ -321,6 +322,23 @@ igmp_scount(struct ip_mc_list *pmc, int type, int gdeleted, int sdeleted) return scount; } +/* source address selection per RFC 3376 section 4.2.13 */ +static __be32 igmpv3_get_srcaddr(struct net_device *dev, + const struct flowi4 *fl4) +{ + struct in_device *in_dev = __in_dev_get_rcu(dev); + + if (!in_dev) + return htonl(INADDR_ANY); + + for_ifa(in_dev) { + if (inet_ifa_match(fl4->saddr, ifa)) + return fl4->saddr; + } endfor_ifa(in_dev); + + return htonl(INADDR_ANY); +} + static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu) { struct sk_buff *skb; @@ -368,7 +386,7 @@ static struct sk_buff *igmpv3_newpack(struct net_device *dev, unsigned int mtu) pip->frag_off = htons(IP_DF); pip->ttl = 1; pip->daddr = fl4.daddr; - pip->saddr = fl4.saddr; + pip->saddr = igmpv3_get_srcaddr(dev, &fl4); pip->protocol = IPPROTO_IGMP; pip->tot_len = 0; /* filled in later */ ip_select_ident(net, skb, NULL); From e3fb538e5715250d6a61a26925215229f2e9f52f Mon Sep 17 00:00:00 2001 From: Kevin Cernekee Date: Wed, 6 Dec 2017 12:12:27 -0800 Subject: [PATCH 0868/1013] netlink: Add netns check on taps [ Upstream commit 93c647643b48f0131f02e45da3bd367d80443291 ] Currently, a nlmon link inside a child namespace can observe systemwide netlink activity. Filter the traffic so that nlmon can only sniff netlink messages from its own netns. Test case: vpnns -- bash -c "ip link add nlmon0 type nlmon; \ ip link set nlmon0 up; \ tcpdump -i nlmon0 -q -w /tmp/nlmon.pcap -U" & sudo ip xfrm state add src 10.1.1.1 dst 10.1.1.2 proto esp \ spi 0x1 mode transport \ auth sha1 0x6162633132330000000000000000000000000000 \ enc aes 0x00000000000000000000000000000000 grep --binary abc123 /tmp/nlmon.pcap Signed-off-by: Kevin Cernekee Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/netlink/af_netlink.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 15c99dfa3d72..aac9d68b4636 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -254,6 +254,9 @@ static int __netlink_deliver_tap_skb(struct sk_buff *skb, struct sock *sk = skb->sk; int ret = -ENOMEM; + if (!net_eq(dev_net(dev), sock_net(sk))) + return 0; + dev_hold(dev); if (is_vmalloc_addr(skb->head)) From 1bad9c5ea85e7522d403559cc249dd7a0ae705b1 Mon Sep 17 00:00:00 2001 From: Sebastian Sjoholm Date: Mon, 11 Dec 2017 21:51:14 +0100 Subject: [PATCH 0869/1013] net: qmi_wwan: add Sierra EM7565 1199:9091 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit [ Upstream commit aceef61ee56898cfa7b6960fb60b9326c3860441 ] Sierra Wireless EM7565 is an Qualcomm MDM9x50 based M.2 modem. The USB id is added to qmi_wwan.c to allow QMI communication with the EM7565. Signed-off-by: Sebastian Sjoholm Acked-by: Bjørn Mork Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/usb/qmi_wwan.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c index 81394a4b2803..2092febfcb42 100644 --- a/drivers/net/usb/qmi_wwan.c +++ b/drivers/net/usb/qmi_wwan.c @@ -1204,6 +1204,7 @@ static const struct usb_device_id products[] = { {QMI_FIXED_INTF(0x1199, 0x9079, 10)}, /* Sierra Wireless EM74xx */ {QMI_FIXED_INTF(0x1199, 0x907b, 8)}, /* Sierra Wireless EM74xx */ {QMI_FIXED_INTF(0x1199, 0x907b, 10)}, /* Sierra Wireless EM74xx */ + {QMI_FIXED_INTF(0x1199, 0x9091, 8)}, /* Sierra Wireless EM7565 */ {QMI_FIXED_INTF(0x1bbb, 0x011e, 4)}, /* Telekom Speedstick LTE II (Alcatel One Touch L100V LTE) */ {QMI_FIXED_INTF(0x1bbb, 0x0203, 2)}, /* Alcatel L800MA */ {QMI_FIXED_INTF(0x2357, 0x0201, 4)}, /* TP-LINK HSUPA Modem MA180 */ From 6d0317869c9153d7e6efa769a764b54019d4a3d3 Mon Sep 17 00:00:00 2001 From: Shaohua Li Date: Wed, 20 Dec 2017 12:10:21 -0800 Subject: [PATCH 0870/1013] net: reevalulate autoflowlabel setting after sysctl setting [ Upstream commit 513674b5a2c9c7a67501506419da5c3c77ac6f08 ] sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901f7c4 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau Cc: Eric Dumazet Cc: Tom Herbert Signed-off-by: Shaohua Li Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/ipv6.h | 3 ++- net/ipv6/af_inet6.c | 1 - net/ipv6/ip6_output.c | 12 ++++++++++-- net/ipv6/ipv6_sockglue.c | 1 + 4 files changed, 13 insertions(+), 4 deletions(-) diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index ea04ca024f0d..067a6fa675ed 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -272,7 +272,8 @@ struct ipv6_pinfo { * 100: prefer care-of address */ dontfrag:1, - autoflowlabel:1; + autoflowlabel:1, + autoflowlabel_set:1; __u8 min_hopcount; __u8 tclass; __be32 rcv_flowinfo; diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index fe5262fd6aa5..bcbd5f3bf8bd 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -210,7 +210,6 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol, np->mcast_hops = IPV6_DEFAULT_MCASTHOPS; np->mc_loop = 1; np->pmtudisc = IPV6_PMTUDISC_WANT; - np->autoflowlabel = ip6_default_np_autolabel(net); np->repflow = net->ipv6.sysctl.flowlabel_reflect; sk->sk_ipv6only = net->ipv6.sysctl.bindv6only; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 5110a418cc4d..f7dd51c42314 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -166,6 +166,14 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb) !(IP6CB(skb)->flags & IP6SKB_REROUTED)); } +static bool ip6_autoflowlabel(struct net *net, const struct ipv6_pinfo *np) +{ + if (!np->autoflowlabel_set) + return ip6_default_np_autolabel(net); + else + return np->autoflowlabel; +} + /* * xmit an sk_buff (used by TCP, SCTP and DCCP) * Note : socket lock is not held for SYNACK packets, but might be modified @@ -230,7 +238,7 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6, hlimit = ip6_dst_hoplimit(dst); ip6_flow_hdr(hdr, tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel, - np->autoflowlabel, fl6)); + ip6_autoflowlabel(net, np), fl6)); hdr->payload_len = htons(seg_len); hdr->nexthdr = proto; @@ -1626,7 +1634,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, ip6_flow_hdr(hdr, v6_cork->tclass, ip6_make_flowlabel(net, skb, fl6->flowlabel, - np->autoflowlabel, fl6)); + ip6_autoflowlabel(net, np), fl6)); hdr->hop_limit = v6_cork->hop_limit; hdr->nexthdr = proto; hdr->saddr = fl6->saddr; diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index a5e466d4e093..90dbfa78a390 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -878,6 +878,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; case IPV6_AUTOFLOWLABEL: np->autoflowlabel = valbool; + np->autoflowlabel_set = 1; retv = 0; break; case IPV6_RECVFRAGSIZE: From 78ce0e9c4183a34b0910ec36c4ad8df1f7bf7874 Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Tue, 5 Dec 2017 21:29:37 +0200 Subject: [PATCH 0871/1013] ptr_ring: add barriers [ Upstream commit a8ceb5dbfde1092b466936bca0ff3be127ecf38e ] Users of ptr_ring expect that it's safe to give the data structure a pointer and have it be available to consumers, but that actually requires an smb_wmb or a stronger barrier. In absence of such barriers and on architectures that reorder writes, consumer might read an un=initialized value from an skb pointer stored in the skb array. This was observed causing crashes. To fix, add memory barriers. The barrier we use is a wmb, the assumption being that producers do not need to read the value so we do not need to order these reads. Reported-by: George Cherian Suggested-by: Jason Wang Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/ptr_ring.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h index 37b4bb2545b3..6866df4f31b5 100644 --- a/include/linux/ptr_ring.h +++ b/include/linux/ptr_ring.h @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r) /* Note: callers invoking this in a loop must use a compiler barrier, * for example cpu_relax(). Callers must hold producer_lock. + * Callers are responsible for making sure pointer that is being queued + * points to a valid data. */ static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr) { if (unlikely(!r->size) || r->queue[r->producer]) return -ENOSPC; + /* Make sure the pointer we are storing points to a valid data. */ + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */ + smp_wmb(); + r->queue[r->producer++] = ptr; if (unlikely(r->producer >= r->size)) r->producer = 0; @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r) if (ptr) __ptr_ring_discard_one(r); + /* Make sure anyone accessing data through the pointer is up to date. */ + /* Pairs with smp_wmb in __ptr_ring_produce. */ + smp_read_barrier_depends(); return ptr; } From e7728247372ca9f7faccca4a37e96964f796c94b Mon Sep 17 00:00:00 2001 From: Avinash Repaka Date: Thu, 21 Dec 2017 20:17:04 -0800 Subject: [PATCH 0872/1013] RDS: Check cmsg_len before dereferencing CMSG_DATA [ Upstream commit 14e138a86f6347c6199f610576d2e11c03bec5f0 ] RDS currently doesn't check if the length of the control message is large enough to hold the required data, before dereferencing the control message data. This results in following crash: BUG: KASAN: stack-out-of-bounds in rds_rdma_bytes net/rds/send.c:1013 [inline] BUG: KASAN: stack-out-of-bounds in rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 Read of size 8 at addr ffff8801c928fb70 by task syzkaller455006/3157 CPU: 0 PID: 3157 Comm: syzkaller455006 Not tainted 4.15.0-rc3+ #161 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430 rds_rdma_bytes net/rds/send.c:1013 [inline] rds_sendmsg+0x1f02/0x1f90 net/rds/send.c:1066 sock_sendmsg_nosec net/socket.c:628 [inline] sock_sendmsg+0xca/0x110 net/socket.c:638 ___sys_sendmsg+0x320/0x8b0 net/socket.c:2018 __sys_sendmmsg+0x1ee/0x620 net/socket.c:2108 SYSC_sendmmsg net/socket.c:2139 [inline] SyS_sendmmsg+0x35/0x60 net/socket.c:2134 entry_SYSCALL_64_fastpath+0x1f/0x96 RIP: 0033:0x43fe49 RSP: 002b:00007fffbe244ad8 EFLAGS: 00000217 ORIG_RAX: 0000000000000133 RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 000000000043fe49 RDX: 0000000000000001 RSI: 000000002020c000 RDI: 0000000000000003 RBP: 00000000006ca018 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000217 R12: 00000000004017b0 R13: 0000000000401840 R14: 0000000000000000 R15: 0000000000000000 To fix this, we verify that the cmsg_len is large enough to hold the data to be read, before proceeding further. Reported-by: syzbot Signed-off-by: Avinash Repaka Acked-by: Santosh Shilimkar Reviewed-by: Yuval Shaia Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/rds/send.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/rds/send.c b/net/rds/send.c index b52cdc8ae428..f72466c63f0c 100644 --- a/net/rds/send.c +++ b/net/rds/send.c @@ -1009,6 +1009,9 @@ static int rds_rdma_bytes(struct msghdr *msg, size_t *rdma_bytes) continue; if (cmsg->cmsg_type == RDS_CMSG_RDMA_ARGS) { + if (cmsg->cmsg_len < + CMSG_LEN(sizeof(struct rds_rdma_args))) + return -EINVAL; args = CMSG_DATA(cmsg); *rdma_bytes += args->remote_vec.bytes; } From e414e7f03c29a4d3f07e042e18bdfe84269897df Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Thu, 7 Dec 2017 12:43:30 -0500 Subject: [PATCH 0873/1013] tcp_bbr: record "full bw reached" decision in new full_bw_reached bit [ Upstream commit c589e69b508d29ed8e644dfecda453f71c02ec27 ] This commit records the "full bw reached" decision in a new full_bw_reached bit. This is a pure refactor that does not change the current behavior, but enables subsequent fixes and improvements. In particular, this enables simple and clean fixes because the full_bw and full_bw_cnt can be unconditionally zeroed without worrying about forgetting that we estimated we filled the pipe in Startup. And it enables future improvements because multiple code paths can be used for estimating that we filled the pipe in Startup; any new code paths only need to set this bit when they think the pipe is full. Note that this fix intentionally reduces the width of the full_bw_cnt counter, since we have never used the most significant bit. Signed-off-by: Neal Cardwell Reviewed-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_bbr.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 69ee877574d0..3089c956b9f9 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -110,7 +110,8 @@ struct bbr { u32 lt_last_lost; /* LT intvl start: tp->lost */ u32 pacing_gain:10, /* current gain for setting pacing rate */ cwnd_gain:10, /* current gain for setting cwnd */ - full_bw_cnt:3, /* number of rounds without large bw gains */ + full_bw_reached:1, /* reached full bw in Startup? */ + full_bw_cnt:2, /* number of rounds without large bw gains */ cycle_idx:3, /* current index in pacing_gain cycle array */ has_seen_rtt:1, /* have we seen an RTT sample yet? */ unused_b:5; @@ -180,7 +181,7 @@ static bool bbr_full_bw_reached(const struct sock *sk) { const struct bbr *bbr = inet_csk_ca(sk); - return bbr->full_bw_cnt >= bbr_full_bw_cnt; + return bbr->full_bw_reached; } /* Return the windowed max recent bandwidth sample, in pkts/uS << BW_SCALE. */ @@ -717,6 +718,7 @@ static void bbr_check_full_bw_reached(struct sock *sk, return; } ++bbr->full_bw_cnt; + bbr->full_bw_reached = bbr->full_bw_cnt >= bbr_full_bw_cnt; } /* If pipe is probably full, drain the queue and then enter steady-state. */ @@ -850,6 +852,7 @@ static void bbr_init(struct sock *sk) bbr->restore_cwnd = 0; bbr->round_start = 0; bbr->idle_restart = 0; + bbr->full_bw_reached = 0; bbr->full_bw = 0; bbr->full_bw_cnt = 0; bbr->cycle_mstamp = 0; From 4f2963559f29f34a8dbcc35a331f152fa3492224 Mon Sep 17 00:00:00 2001 From: Christoph Paasch Date: Mon, 11 Dec 2017 00:05:46 -0800 Subject: [PATCH 0874/1013] tcp md5sig: Use skb's saddr when replying to an incoming segment [ Upstream commit 30791ac41927ebd3e75486f9504b6d2280463bf0 ] The MD5-key that belongs to a connection is identified by the peer's IP-address. When we are in tcp_v4(6)_reqsk_send_ack(), we are replying to an incoming segment from tcp_check_req() that failed the seq-number checks. Thus, to find the correct key, we need to use the skb's saddr and not the daddr. This bug seems to have been there since quite a while, but probably got unnoticed because the consequences are not catastrophic. We will call tcp_v4_reqsk_send_ack only to send a challenge-ACK back to the peer, thus the connection doesn't really fail. Fixes: 9501f9722922 ("tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash().") Signed-off-by: Christoph Paasch Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_ipv4.c | 2 +- net/ipv6/tcp_ipv6.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 5a5ed4f14678..cab4b935e474 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -844,7 +844,7 @@ static void tcp_v4_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, tcp_time_stamp_raw() + tcp_rsk(req)->ts_off, req->ts_recent, 0, - tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->daddr, + tcp_md5_do_lookup(sk, (union tcp_md5_addr *)&ip_hdr(skb)->saddr, AF_INET), inet_rsk(req)->no_srccheck ? IP_REPLY_ARG_NOSRCCHECK : 0, ip_hdr(skb)->tos); diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 32ded300633d..237cc6187c5a 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -988,7 +988,7 @@ static void tcp_v6_reqsk_send_ack(const struct sock *sk, struct sk_buff *skb, req->rsk_rcv_wnd >> inet_rsk(req)->rcv_wscale, tcp_time_stamp_raw() + tcp_rsk(req)->ts_off, req->ts_recent, sk->sk_bound_dev_if, - tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->daddr), + tcp_v6_md5_do_lookup(sk, &ipv6_hdr(skb)->saddr), 0, 0); } From d3f3d4134eb700ad087691effbdf26fe8f3bfd90 Mon Sep 17 00:00:00 2001 From: Brian King Date: Fri, 15 Dec 2017 15:21:50 -0600 Subject: [PATCH 0875/1013] tg3: Fix rx hang on MTU change with 5717/5719 [ Upstream commit 748a240c589824e9121befb1cba5341c319885bc ] This fixes a hang issue seen when changing the MTU size from 1500 MTU to 9000 MTU on both 5717 and 5719 chips. In discussion with Broadcom, they've indicated that these chipsets have the same phy as the 57766 chipset, so the same workarounds apply. This has been tested by IBM on both Power 8 and Power 9 systems as well as by Broadcom on x86 hardware and has been confirmed to resolve the hang issue. Signed-off-by: Brian King Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/tg3.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 656e6af70f0a..aef3fcf2f5b9 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -14227,7 +14227,9 @@ static int tg3_change_mtu(struct net_device *dev, int new_mtu) /* Reset PHY, otherwise the read DMA engine will be in a mode that * breaks all requests to 256 bytes. */ - if (tg3_asic_rev(tp) == ASIC_REV_57766) + if (tg3_asic_rev(tp) == ASIC_REV_57766 || + tg3_asic_rev(tp) == ASIC_REV_5717 || + tg3_asic_rev(tp) == ASIC_REV_5719) reset_phy = true; err = tg3_restart_hw(tp, reset_phy); From eb710b5f62ad3a18742bae70f91e8664ee23cbe3 Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Thu, 7 Dec 2017 12:43:31 -0500 Subject: [PATCH 0876/1013] tcp_bbr: reset full pipe detection on loss recovery undo [ Upstream commit 2f6c498e4f15d27852c04ed46d804a39137ba364 ] Fix BBR so that upon notification of a loss recovery undo BBR resets the full pipe detection (STARTUP exit) state machine. Under high reordering, reordering events can be interpreted as loss. If the reordering and spurious loss estimates are high enough, this could previously cause BBR to spuriously estimate that the pipe is full. Since spurious loss recovery means that our overall sending will have slowed down spuriously, this commit gives a flow more time to probe robustly for bandwidth and decide the pipe is really full. Signed-off-by: Neal Cardwell Reviewed-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_bbr.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index 3089c956b9f9..ab3ff14ea7f7 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -874,6 +874,10 @@ static u32 bbr_sndbuf_expand(struct sock *sk) */ static u32 bbr_undo_cwnd(struct sock *sk) { + struct bbr *bbr = inet_csk_ca(sk); + + bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */ + bbr->full_bw_cnt = 0; return tcp_sk(sk)->snd_cwnd; } From c7e9d724785dc9b6a423bddc6b4fa6fb1368692f Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Thu, 7 Dec 2017 12:43:32 -0500 Subject: [PATCH 0877/1013] tcp_bbr: reset long-term bandwidth sampling on loss recovery undo [ Upstream commit 600647d467c6d04b3954b41a6ee1795b5ae00550 ] Fix BBR so that upon notification of a loss recovery undo BBR resets long-term bandwidth sampling. Under high reordering, reordering events can be interpreted as loss. If the reordering and spurious loss estimates are high enough, this can cause BBR to spuriously estimate that we are seeing loss rates high enough to trigger long-term bandwidth estimation. To avoid that problem, this commit resets long-term bandwidth sampling on loss recovery undo events. Signed-off-by: Neal Cardwell Reviewed-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_bbr.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/ipv4/tcp_bbr.c b/net/ipv4/tcp_bbr.c index ab3ff14ea7f7..8322f26e770e 100644 --- a/net/ipv4/tcp_bbr.c +++ b/net/ipv4/tcp_bbr.c @@ -878,6 +878,7 @@ static u32 bbr_undo_cwnd(struct sock *sk) bbr->full_bw = 0; /* spurious slow-down; reset full pipe detection */ bbr->full_bw_cnt = 0; + bbr_reset_lt_bw_sampling(sk); return tcp_sk(sk)->snd_cwnd; } From 621b5ae0f9f4f9ef91bf441afc086ecf5e752d51 Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Wed, 13 Dec 2017 18:56:29 +0100 Subject: [PATCH 0878/1013] s390/qeth: apply takeover changes when mode is toggled [ Upstream commit 7fbd9493f0eeae8cef58300505a9ef5c8fce6313 ] Just as for an explicit enable/disable, toggling the takeover mode also requires that the IP addresses get updated. Otherwise all IPs that were added to the table before the mode-toggle, get registered with the old settings. Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_core.h | 2 +- drivers/s390/net/qeth_core_main.c | 2 +- drivers/s390/net/qeth_l3_sys.c | 35 +++++++++++++++---------------- 3 files changed, 19 insertions(+), 20 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 5340efc673a9..0a7bf3cbb701 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -564,7 +564,7 @@ enum qeth_cq { }; struct qeth_ipato { - int enabled; + bool enabled; int invert4; int invert6; struct list_head entries; diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 330e5d3dadf3..be95f8c8f7a7 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -1479,7 +1479,7 @@ static int qeth_setup_card(struct qeth_card *card) qeth_set_intial_options(card); /* IP address takeover */ INIT_LIST_HEAD(&card->ipato.entries); - card->ipato.enabled = 0; + card->ipato.enabled = false; card->ipato.invert4 = 0; card->ipato.invert6 = 0; /* init QDIO stuff */ diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c index 7a829ad77783..30bfeb1004a3 100644 --- a/drivers/s390/net/qeth_l3_sys.c +++ b/drivers/s390/net/qeth_l3_sys.c @@ -372,6 +372,7 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev, struct qeth_card *card = dev_get_drvdata(dev); struct qeth_ipaddr *addr; int i, rc = 0; + bool enable; if (!card) return -EINVAL; @@ -384,25 +385,23 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev, } if (sysfs_streq(buf, "toggle")) { - card->ipato.enabled = (card->ipato.enabled)? 0 : 1; - } else if (sysfs_streq(buf, "1")) { - card->ipato.enabled = 1; - hash_for_each(card->ip_htable, i, addr, hnode) { - if ((addr->type == QETH_IP_TYPE_NORMAL) && - qeth_l3_is_addr_covered_by_ipato(card, addr)) - addr->set_flags |= - QETH_IPA_SETIP_TAKEOVER_FLAG; - } - } else if (sysfs_streq(buf, "0")) { - card->ipato.enabled = 0; - hash_for_each(card->ip_htable, i, addr, hnode) { - if (addr->set_flags & - QETH_IPA_SETIP_TAKEOVER_FLAG) - addr->set_flags &= - ~QETH_IPA_SETIP_TAKEOVER_FLAG; - } - } else + enable = !card->ipato.enabled; + } else if (kstrtobool(buf, &enable)) { rc = -EINVAL; + goto out; + } + + if (card->ipato.enabled == enable) + goto out; + card->ipato.enabled = enable; + + hash_for_each(card->ip_htable, i, addr, hnode) { + if (!enable) + addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG; + else if (addr->type == QETH_IP_TYPE_NORMAL && + qeth_l3_is_addr_covered_by_ipato(card, addr)) + addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG; + } out: mutex_unlock(&card->conf_mutex); return rc ? rc : count; From e34a43e57c218bf28f8ec73a1ce68fec41c4b20c Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Wed, 13 Dec 2017 18:56:30 +0100 Subject: [PATCH 0879/1013] s390/qeth: don't apply takeover changes to RXIP [ Upstream commit b22d73d6689fd902a66c08ebe71ab2f3b351e22f ] When takeover is switched off, current code clears the 'TAKEOVER' flag on all IPs. But the flag is also used for RXIP addresses, and those should not be affected by the takeover mode. Fix the behaviour by consistenly applying takover logic to NORMAL addresses only. Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_l3_main.c | 5 +++-- drivers/s390/net/qeth_l3_sys.c | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 27185ab38136..56d1c756aef9 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -173,6 +173,8 @@ int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card, if (!card->ipato.enabled) return 0; + if (addr->type != QETH_IP_TYPE_NORMAL) + return 0; qeth_l3_convert_addr_to_bits((u8 *) &addr->u, addr_bits, (addr->proto == QETH_PROT_IPV4)? 4:16); @@ -289,8 +291,7 @@ int qeth_l3_add_ip(struct qeth_card *card, struct qeth_ipaddr *tmp_addr) memcpy(addr, tmp_addr, sizeof(struct qeth_ipaddr)); addr->ref_counter = 1; - if (addr->type == QETH_IP_TYPE_NORMAL && - qeth_l3_is_addr_covered_by_ipato(card, addr)) { + if (qeth_l3_is_addr_covered_by_ipato(card, addr)) { QETH_CARD_TEXT(card, 2, "tkovaddr"); addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG; } diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c index 30bfeb1004a3..aada072ef0e3 100644 --- a/drivers/s390/net/qeth_l3_sys.c +++ b/drivers/s390/net/qeth_l3_sys.c @@ -396,10 +396,11 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev, card->ipato.enabled = enable; hash_for_each(card->ip_htable, i, addr, hnode) { + if (addr->type != QETH_IP_TYPE_NORMAL) + continue; if (!enable) addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG; - else if (addr->type == QETH_IP_TYPE_NORMAL && - qeth_l3_is_addr_covered_by_ipato(card, addr)) + else if (qeth_l3_is_addr_covered_by_ipato(card, addr)) addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG; } out: From 8658408f284efd553a882ee717326845130c51d9 Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Wed, 13 Dec 2017 18:56:31 +0100 Subject: [PATCH 0880/1013] s390/qeth: lock IP table while applying takeover changes [ Upstream commit 8a03a3692b100d84785ee7a834e9215e304c9e00 ] Modifying the flags of an IP addr object needs to be protected against eg. concurrent removal of the same object from the IP table. Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_l3_sys.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c index aada072ef0e3..a2634836fd1e 100644 --- a/drivers/s390/net/qeth_l3_sys.c +++ b/drivers/s390/net/qeth_l3_sys.c @@ -395,6 +395,7 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev, goto out; card->ipato.enabled = enable; + spin_lock_bh(&card->ip_lock); hash_for_each(card->ip_htable, i, addr, hnode) { if (addr->type != QETH_IP_TYPE_NORMAL) continue; @@ -403,6 +404,7 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev, else if (qeth_l3_is_addr_covered_by_ipato(card, addr)) addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG; } + spin_unlock_bh(&card->ip_lock); out: mutex_unlock(&card->conf_mutex); return rc ? rc : count; From 72b44d0434c1a4688b55aaa59150d74344721306 Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Wed, 13 Dec 2017 18:56:32 +0100 Subject: [PATCH 0881/1013] s390/qeth: update takeover IPs after configuration change [ Upstream commit 02f510f326501470348a5df341e8232c3497bbbb ] Any modification to the takeover IP-ranges requires that we re-evaluate which IP addresses are takeover-eligible. Otherwise we might do takeover for some addresses when we no longer should, or vice-versa. Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_core.h | 4 +- drivers/s390/net/qeth_core_main.c | 4 +- drivers/s390/net/qeth_l3.h | 2 +- drivers/s390/net/qeth_l3_main.c | 31 +++++++++++++-- drivers/s390/net/qeth_l3_sys.c | 63 +++++++++++++++++-------------- 5 files changed, 67 insertions(+), 37 deletions(-) diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h index 0a7bf3cbb701..92dd4aef21a3 100644 --- a/drivers/s390/net/qeth_core.h +++ b/drivers/s390/net/qeth_core.h @@ -565,8 +565,8 @@ enum qeth_cq { struct qeth_ipato { bool enabled; - int invert4; - int invert6; + bool invert4; + bool invert6; struct list_head entries; }; diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index be95f8c8f7a7..291eb89ecb06 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -1480,8 +1480,8 @@ static int qeth_setup_card(struct qeth_card *card) /* IP address takeover */ INIT_LIST_HEAD(&card->ipato.entries); card->ipato.enabled = false; - card->ipato.invert4 = 0; - card->ipato.invert6 = 0; + card->ipato.invert4 = false; + card->ipato.invert6 = false; /* init QDIO stuff */ qeth_init_qdio_info(card); INIT_DELAYED_WORK(&card->buffer_reclaim_work, qeth_buffer_reclaim_work); diff --git a/drivers/s390/net/qeth_l3.h b/drivers/s390/net/qeth_l3.h index 194ae9b577cc..e5833837b799 100644 --- a/drivers/s390/net/qeth_l3.h +++ b/drivers/s390/net/qeth_l3.h @@ -82,7 +82,7 @@ void qeth_l3_del_vipa(struct qeth_card *, enum qeth_prot_versions, const u8 *); int qeth_l3_add_rxip(struct qeth_card *, enum qeth_prot_versions, const u8 *); void qeth_l3_del_rxip(struct qeth_card *card, enum qeth_prot_versions, const u8 *); -int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *, struct qeth_ipaddr *); +void qeth_l3_update_ipato(struct qeth_card *card); struct qeth_ipaddr *qeth_l3_get_addr_buffer(enum qeth_prot_versions); int qeth_l3_add_ip(struct qeth_card *, struct qeth_ipaddr *); int qeth_l3_delete_ip(struct qeth_card *, struct qeth_ipaddr *); diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c index 56d1c756aef9..36dee176f8e2 100644 --- a/drivers/s390/net/qeth_l3_main.c +++ b/drivers/s390/net/qeth_l3_main.c @@ -163,8 +163,8 @@ static void qeth_l3_convert_addr_to_bits(u8 *addr, u8 *bits, int len) } } -int qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card, - struct qeth_ipaddr *addr) +static bool qeth_l3_is_addr_covered_by_ipato(struct qeth_card *card, + struct qeth_ipaddr *addr) { struct qeth_ipato_entry *ipatoe; u8 addr_bits[128] = {0, }; @@ -605,6 +605,27 @@ int qeth_l3_setrouting_v6(struct qeth_card *card) /* * IP address takeover related functions */ + +/** + * qeth_l3_update_ipato() - Update 'takeover' property, for all NORMAL IPs. + * + * Caller must hold ip_lock. + */ +void qeth_l3_update_ipato(struct qeth_card *card) +{ + struct qeth_ipaddr *addr; + unsigned int i; + + hash_for_each(card->ip_htable, i, addr, hnode) { + if (addr->type != QETH_IP_TYPE_NORMAL) + continue; + if (qeth_l3_is_addr_covered_by_ipato(card, addr)) + addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG; + else + addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG; + } +} + static void qeth_l3_clear_ipato_list(struct qeth_card *card) { struct qeth_ipato_entry *ipatoe, *tmp; @@ -616,6 +637,7 @@ static void qeth_l3_clear_ipato_list(struct qeth_card *card) kfree(ipatoe); } + qeth_l3_update_ipato(card); spin_unlock_bh(&card->ip_lock); } @@ -640,8 +662,10 @@ int qeth_l3_add_ipato_entry(struct qeth_card *card, } } - if (!rc) + if (!rc) { list_add_tail(&new->entry, &card->ipato.entries); + qeth_l3_update_ipato(card); + } spin_unlock_bh(&card->ip_lock); @@ -664,6 +688,7 @@ void qeth_l3_del_ipato_entry(struct qeth_card *card, (proto == QETH_PROT_IPV4)? 4:16) && (ipatoe->mask_bits == mask_bits)) { list_del(&ipatoe->entry); + qeth_l3_update_ipato(card); kfree(ipatoe); } } diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c index a2634836fd1e..1295dd8ec849 100644 --- a/drivers/s390/net/qeth_l3_sys.c +++ b/drivers/s390/net/qeth_l3_sys.c @@ -370,9 +370,8 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct qeth_card *card = dev_get_drvdata(dev); - struct qeth_ipaddr *addr; - int i, rc = 0; bool enable; + int rc = 0; if (!card) return -EINVAL; @@ -391,20 +390,12 @@ static ssize_t qeth_l3_dev_ipato_enable_store(struct device *dev, goto out; } - if (card->ipato.enabled == enable) - goto out; - card->ipato.enabled = enable; - - spin_lock_bh(&card->ip_lock); - hash_for_each(card->ip_htable, i, addr, hnode) { - if (addr->type != QETH_IP_TYPE_NORMAL) - continue; - if (!enable) - addr->set_flags &= ~QETH_IPA_SETIP_TAKEOVER_FLAG; - else if (qeth_l3_is_addr_covered_by_ipato(card, addr)) - addr->set_flags |= QETH_IPA_SETIP_TAKEOVER_FLAG; + if (card->ipato.enabled != enable) { + card->ipato.enabled = enable; + spin_lock_bh(&card->ip_lock); + qeth_l3_update_ipato(card); + spin_unlock_bh(&card->ip_lock); } - spin_unlock_bh(&card->ip_lock); out: mutex_unlock(&card->conf_mutex); return rc ? rc : count; @@ -430,20 +421,27 @@ static ssize_t qeth_l3_dev_ipato_invert4_store(struct device *dev, const char *buf, size_t count) { struct qeth_card *card = dev_get_drvdata(dev); + bool invert; int rc = 0; if (!card) return -EINVAL; mutex_lock(&card->conf_mutex); - if (sysfs_streq(buf, "toggle")) - card->ipato.invert4 = (card->ipato.invert4)? 0 : 1; - else if (sysfs_streq(buf, "1")) - card->ipato.invert4 = 1; - else if (sysfs_streq(buf, "0")) - card->ipato.invert4 = 0; - else + if (sysfs_streq(buf, "toggle")) { + invert = !card->ipato.invert4; + } else if (kstrtobool(buf, &invert)) { rc = -EINVAL; + goto out; + } + + if (card->ipato.invert4 != invert) { + card->ipato.invert4 = invert; + spin_lock_bh(&card->ip_lock); + qeth_l3_update_ipato(card); + spin_unlock_bh(&card->ip_lock); + } +out: mutex_unlock(&card->conf_mutex); return rc ? rc : count; } @@ -609,20 +607,27 @@ static ssize_t qeth_l3_dev_ipato_invert6_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t count) { struct qeth_card *card = dev_get_drvdata(dev); + bool invert; int rc = 0; if (!card) return -EINVAL; mutex_lock(&card->conf_mutex); - if (sysfs_streq(buf, "toggle")) - card->ipato.invert6 = (card->ipato.invert6)? 0 : 1; - else if (sysfs_streq(buf, "1")) - card->ipato.invert6 = 1; - else if (sysfs_streq(buf, "0")) - card->ipato.invert6 = 0; - else + if (sysfs_streq(buf, "toggle")) { + invert = !card->ipato.invert6; + } else if (kstrtobool(buf, &invert)) { rc = -EINVAL; + goto out; + } + + if (card->ipato.invert6 != invert) { + card->ipato.invert6 = invert; + spin_lock_bh(&card->ip_lock); + qeth_l3_update_ipato(card); + spin_unlock_bh(&card->ip_lock); + } +out: mutex_unlock(&card->conf_mutex); return rc ? rc : count; } From 3bc400bad0e003d40a0a2412411aed7cbae16f96 Mon Sep 17 00:00:00 2001 From: Mohamed Ghannam Date: Sun, 10 Dec 2017 03:50:58 +0000 Subject: [PATCH 0882/1013] net: ipv4: fix for a race condition in raw_sendmsg [ Upstream commit 8f659a03a0ba9289b9aeb9b4470e6fb263d6f483 ] inet->hdrincl is racy, and could lead to uninitialized stack pointer usage, so its value should be read only once. Fixes: c008ba5bdc9f ("ipv4: Avoid reading user iov twice after raw_probe_proto_opt") Signed-off-by: Mohamed Ghannam Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/raw.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index 33b70bfd1122..125c1eab3eaa 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -513,11 +513,16 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) int err; struct ip_options_data opt_copy; struct raw_frag_vec rfv; + int hdrincl; err = -EMSGSIZE; if (len > 0xFFFF) goto out; + /* hdrincl should be READ_ONCE(inet->hdrincl) + * but READ_ONCE() doesn't work with bit fields + */ + hdrincl = inet->hdrincl; /* * Check the flags. */ @@ -593,7 +598,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) /* Linux does not mangle headers on raw sockets, * so that IP options + IP_HDRINCL is non-sense. */ - if (inet->hdrincl) + if (hdrincl) goto done; if (ipc.opt->opt.srr) { if (!daddr) @@ -615,12 +620,12 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) flowi4_init_output(&fl4, ipc.oif, sk->sk_mark, tos, RT_SCOPE_UNIVERSE, - inet->hdrincl ? IPPROTO_RAW : sk->sk_protocol, + hdrincl ? IPPROTO_RAW : sk->sk_protocol, inet_sk_flowi_flags(sk) | - (inet->hdrincl ? FLOWI_FLAG_KNOWN_NH : 0), + (hdrincl ? FLOWI_FLAG_KNOWN_NH : 0), daddr, saddr, 0, 0, sk->sk_uid); - if (!inet->hdrincl) { + if (!hdrincl) { rfv.msg = msg; rfv.hlen = 0; @@ -645,7 +650,7 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) goto do_confirm; back_from_confirm: - if (inet->hdrincl) + if (hdrincl) err = raw_send_hdrinc(sk, &fl4, msg, len, &rt, msg->msg_flags, &ipc.sockc); From 9f49cbc7cd207f27ead81246d9553515aea68131 Mon Sep 17 00:00:00 2001 From: Tobias Jordan Date: Wed, 6 Dec 2017 15:23:23 +0100 Subject: [PATCH 0883/1013] net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case [ Upstream commit 589bf32f09852041fbd3b7ce1a9e703f95c230ba ] add appropriate calls to clk_disable_unprepare() by jumping to out_mdio in case orion_mdio_probe() returns -EPROBE_DEFER. Found by Linux Driver Verification project (linuxtesting.org). Fixes: 3d604da1e954 ("net: mvmdio: get and enable optional clock") Signed-off-by: Tobias Jordan Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/marvell/mvmdio.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/marvell/mvmdio.c b/drivers/net/ethernet/marvell/mvmdio.c index c9798210fa0f..0495487f7b42 100644 --- a/drivers/net/ethernet/marvell/mvmdio.c +++ b/drivers/net/ethernet/marvell/mvmdio.c @@ -344,7 +344,8 @@ static int orion_mdio_probe(struct platform_device *pdev) dev->regs + MVMDIO_ERR_INT_MASK); } else if (dev->err_interrupt == -EPROBE_DEFER) { - return -EPROBE_DEFER; + ret = -EPROBE_DEFER; + goto out_mdio; } if (pdev->dev.of_node) From a3927015a4bbcb73d35cdbed24f81efeba1d7a6a Mon Sep 17 00:00:00 2001 From: Tonghao Zhang Date: Fri, 22 Dec 2017 10:15:20 -0800 Subject: [PATCH 0884/1013] sctp: Replace use of sockets_allocated with specified macro. [ Upstream commit 8cb38a602478e9f806571f6920b0a3298aabf042 ] The patch(180d8cd942ce) replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to accessor macros. But the sockets_allocated field of sctp sock is not replaced at all. Then replace it now for unifying the code. Fixes: 180d8cd942ce ("foundations of per-cgroup memory pressure controlling.") Cc: Glauber Costa Signed-off-by: Tonghao Zhang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sctp/socket.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index d6163f7aefb1..1977238dc023 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -4413,7 +4413,7 @@ static int sctp_init_sock(struct sock *sk) SCTP_DBG_OBJCNT_INC(sock); local_bh_disable(); - percpu_counter_inc(&sctp_sockets_allocated); + sk_sockets_allocated_inc(sk); sock_prot_inuse_add(net, sk->sk_prot, 1); /* Nothing can fail after this block, otherwise @@ -4457,7 +4457,7 @@ static void sctp_destroy_sock(struct sock *sk) } sctp_endpoint_free(sp->ep); local_bh_disable(); - percpu_counter_dec(&sctp_sockets_allocated); + sk_sockets_allocated_dec(sk); sock_prot_inuse_add(sock_net(sk), sk->sk_prot, -1); local_bh_enable(); } From 6d1c489810bcde9266ccdbedc0861a5dc9778f60 Mon Sep 17 00:00:00 2001 From: "Nikita V. Shirokov" Date: Wed, 6 Dec 2017 17:15:43 -0800 Subject: [PATCH 0885/1013] adding missing rcu_read_unlock in ipxip6_rcv [ Upstream commit 74c4b656c3d92ec4c824ea1a4afd726b7b6568c8 ] commit 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") introduced new exit point in ipxip6_rcv. however rcu_read_unlock is missing there. this diff is fixing this v1->v2: instead of doing rcu_read_unlock in place, we are going to "drop" section (to prevent skb leakage) Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels") Signed-off-by: Nikita V. Shirokov Acked-by: Alexei Starovoitov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_tunnel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index a1c24443cd9e..ef958d50746b 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -912,7 +912,7 @@ static int ipxip6_rcv(struct sk_buff *skb, u8 ipproto, if (t->parms.collect_md) { tun_dst = ipv6_tun_rx_dst(skb, 0, 0, 0); if (!tun_dst) - return 0; + goto drop; } ret = __ip6_tnl_rcv(t, skb, tpi, tun_dst, dscp_ecn_decapsulate, log_ecn_error); From 44319591ffa28770f12f33b70e4269d8cda8cb5f Mon Sep 17 00:00:00 2001 From: Alexey Kodanev Date: Wed, 20 Dec 2017 19:36:03 +0300 Subject: [PATCH 0886/1013] ip6_gre: fix device features for ioctl setup [ Upstream commit e5a9336adb317db55eb3fe8200856096f3c71109 ] When ip6gre is created using ioctl, its features, such as scatter-gather, GSO and tx-checksumming will be turned off: # ip -f inet6 tunnel add gre6 mode ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: off scatter-gather: off tcp-segmentation-offload: off generic-segmentation-offload: off [requested on] But when netlink is used, they will be enabled: # ip link add gre6 type ip6gre remote fd00::1 # ethtool -k gre6 (truncated output) tx-checksumming: on scatter-gather: on tcp-segmentation-offload: on generic-segmentation-offload: on This results in a loss of performance when gre6 is created via ioctl. The issue was found with LTP/gre tests. Fix it by moving the setup of device features to a separate function and invoke it with ndo_init callback because both netlink and ioctl will eventually call it via register_netdevice(): register_netdevice() - ndo_init() callback -> ip6gre_tunnel_init() or ip6gre_tap_init() - ip6gre_tunnel_init_common() - ip6gre_tnl_init_features() The moved code also contains two minor style fixes: * removed needless tab from GRE6_FEATURES on NETIF_F_HIGHDMA line. * fixed the issue reported by checkpatch: "Unnecessary parentheses around 'nt->encap.type == TUNNEL_ENCAP_NONE'" Fixes: ac4eb009e477 ("ip6gre: Add support for basic offloads offloads excluding GSO") Signed-off-by: Alexey Kodanev Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/ip6_gre.c | 57 ++++++++++++++++++++++++++-------------------- 1 file changed, 32 insertions(+), 25 deletions(-) diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c index 5d6bee070871..7a2df6646486 100644 --- a/net/ipv6/ip6_gre.c +++ b/net/ipv6/ip6_gre.c @@ -1020,6 +1020,36 @@ static void ip6gre_tunnel_setup(struct net_device *dev) eth_random_addr(dev->perm_addr); } +#define GRE6_FEATURES (NETIF_F_SG | \ + NETIF_F_FRAGLIST | \ + NETIF_F_HIGHDMA | \ + NETIF_F_HW_CSUM) + +static void ip6gre_tnl_init_features(struct net_device *dev) +{ + struct ip6_tnl *nt = netdev_priv(dev); + + dev->features |= GRE6_FEATURES; + dev->hw_features |= GRE6_FEATURES; + + if (!(nt->parms.o_flags & TUNNEL_SEQ)) { + /* TCP offload with GRE SEQ is not supported, nor + * can we support 2 levels of outer headers requiring + * an update. + */ + if (!(nt->parms.o_flags & TUNNEL_CSUM) || + nt->encap.type == TUNNEL_ENCAP_NONE) { + dev->features |= NETIF_F_GSO_SOFTWARE; + dev->hw_features |= NETIF_F_GSO_SOFTWARE; + } + + /* Can use a lockless transmit, unless we generate + * output sequences + */ + dev->features |= NETIF_F_LLTX; + } +} + static int ip6gre_tunnel_init_common(struct net_device *dev) { struct ip6_tnl *tunnel; @@ -1054,6 +1084,8 @@ static int ip6gre_tunnel_init_common(struct net_device *dev) if (!(tunnel->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) dev->mtu -= 8; + ip6gre_tnl_init_features(dev); + return 0; } @@ -1302,11 +1334,6 @@ static const struct net_device_ops ip6gre_tap_netdev_ops = { .ndo_get_iflink = ip6_tnl_get_iflink, }; -#define GRE6_FEATURES (NETIF_F_SG | \ - NETIF_F_FRAGLIST | \ - NETIF_F_HIGHDMA | \ - NETIF_F_HW_CSUM) - static void ip6gre_tap_setup(struct net_device *dev) { @@ -1386,26 +1413,6 @@ static int ip6gre_newlink(struct net *src_net, struct net_device *dev, nt->net = dev_net(dev); ip6gre_tnl_link_config(nt, !tb[IFLA_MTU]); - dev->features |= GRE6_FEATURES; - dev->hw_features |= GRE6_FEATURES; - - if (!(nt->parms.o_flags & TUNNEL_SEQ)) { - /* TCP offload with GRE SEQ is not supported, nor - * can we support 2 levels of outer headers requiring - * an update. - */ - if (!(nt->parms.o_flags & TUNNEL_CSUM) || - (nt->encap.type == TUNNEL_ENCAP_NONE)) { - dev->features |= NETIF_F_GSO_SOFTWARE; - dev->hw_features |= NETIF_F_GSO_SOFTWARE; - } - - /* Can use a lockless transmit, unless we generate - * output sequences - */ - dev->features |= NETIF_F_LLTX; - } - err = register_netdevice(dev); if (err) goto out; From 27ccace9b982ea796e15ba9dc6af5941539c6be8 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Wed, 20 Dec 2017 19:34:19 +0200 Subject: [PATCH 0887/1013] ipv4: Fix use-after-free when flushing FIB tables [ Upstream commit b4681c2829e24943aadd1a7bb3a30d41d0a20050 ] Since commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") the local table uses the same trie allocated for the main table when custom rules are not in use. When a net namespace is dismantled, the main table is flushed and freed (via an RCU callback) before the local table. In case the callback is invoked before the local table is iterated, a use-after-free can occur. Fix this by iterating over the FIB tables in reverse order, so that the main table is always freed after the local table. v3: Reworded comment according to Alex's suggestion. v2: Add a comment to make the fix more explicit per Dave's and Alex's feedback. Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Ido Schimmel Reported-by: Fengguang Wu Acked-by: Alexander Duyck Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fib_frontend.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index 37819ab4cc74..d72874150905 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1274,14 +1274,19 @@ static int __net_init ip_fib_net_init(struct net *net) static void ip_fib_net_exit(struct net *net) { - unsigned int i; + int i; rtnl_lock(); #ifdef CONFIG_IP_MULTIPLE_TABLES RCU_INIT_POINTER(net->ipv4.fib_main, NULL); RCU_INIT_POINTER(net->ipv4.fib_default, NULL); #endif - for (i = 0; i < FIB_TABLE_HASHSZ; i++) { + /* Destroy the tables in reverse order to guarantee that the + * local table, ID 255, is destroyed before the main table, ID + * 254. This is necessary as the local table may contain + * references to data contained in the main table. + */ + for (i = FIB_TABLE_HASHSZ - 1; i >= 0; i--) { struct hlist_head *head = &net->ipv4.fib_table_hash[i]; struct hlist_node *tmp; struct fib_table *tb; From 126f42ecfcb48ec50b289124f23dafa499012650 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Mon, 18 Dec 2017 17:35:09 +0200 Subject: [PATCH 0888/1013] net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks [ Upstream commit 84aeb437ab98a2bce3d4b2111c79723aedfceb33 ] The early call to br_stp_change_bridge_id in bridge's newlink can cause a memory leak if an error occurs during the newlink because the fdb entries are not cleaned up if a different lladdr was specified, also another minor issue is that it generates fdb notifications with ifindex = 0. Another unrelated memory leak is the bridge sysfs entries which get added on NETDEV_REGISTER event, but are not cleaned up in the newlink error path. To remove this special case the call to br_stp_change_bridge_id is done after netdev register and we cleanup the bridge on changelink error via br_dev_delete to plug all leaks. This patch makes netlink bridge destruction on newlink error the same as dellink and ioctl del which is necessary since at that point we have a fully initialized bridge device. To reproduce the issue: $ ip l add br0 address 00:11:22:33:44:55 type bridge group_fwd_mask 1 RTNETLINK answers: Invalid argument $ rmmod bridge [ 1822.142525] ============================================================================= [ 1822.143640] BUG bridge_fdb_cache (Tainted: G O ): Objects remaining in bridge_fdb_cache on __kmem_cache_shutdown() [ 1822.144821] ----------------------------------------------------------------------------- [ 1822.145990] Disabling lock debugging due to kernel taint [ 1822.146732] INFO: Slab 0x0000000092a844b2 objects=32 used=2 fp=0x00000000fef011b0 flags=0x1ffff8000000100 [ 1822.147700] CPU: 2 PID: 13584 Comm: rmmod Tainted: G B O 4.15.0-rc2+ #87 [ 1822.148578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 1822.150008] Call Trace: [ 1822.150510] dump_stack+0x78/0xa9 [ 1822.151156] slab_err+0xb1/0xd3 [ 1822.151834] ? __kmalloc+0x1bb/0x1ce [ 1822.152546] __kmem_cache_shutdown+0x151/0x28b [ 1822.153395] shutdown_cache+0x13/0x144 [ 1822.154126] kmem_cache_destroy+0x1c0/0x1fb [ 1822.154669] SyS_delete_module+0x194/0x244 [ 1822.155199] ? trace_hardirqs_on_thunk+0x1a/0x1c [ 1822.155773] entry_SYSCALL_64_fastpath+0x23/0x9a [ 1822.156343] RIP: 0033:0x7f929bd38b17 [ 1822.156859] RSP: 002b:00007ffd160e9a98 EFLAGS: 00000202 ORIG_RAX: 00000000000000b0 [ 1822.157728] RAX: ffffffffffffffda RBX: 00005578316ba090 RCX: 00007f929bd38b17 [ 1822.158422] RDX: 00007f929bd9ec60 RSI: 0000000000000800 RDI: 00005578316ba0f0 [ 1822.159114] RBP: 0000000000000003 R08: 00007f929bff5f20 R09: 00007ffd160e8a11 [ 1822.159808] R10: 00007ffd160e9860 R11: 0000000000000202 R12: 00007ffd160e8a80 [ 1822.160513] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578316ba090 [ 1822.161278] INFO: Object 0x000000007645de29 @offset=0 [ 1822.161666] INFO: Object 0x00000000d5df2ab5 @offset=128 Fixes: 30313a3d5794 ("bridge: Handle IFLA_ADDRESS correctly when creating bridge device") Fixes: 5b8d5429daa0 ("bridge: netlink: register netdevice before executing changelink") Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/bridge/br_netlink.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c index de2152730809..08190db0a2dc 100644 --- a/net/bridge/br_netlink.c +++ b/net/bridge/br_netlink.c @@ -1223,19 +1223,20 @@ static int br_dev_newlink(struct net *src_net, struct net_device *dev, struct net_bridge *br = netdev_priv(dev); int err; + err = register_netdevice(dev); + if (err) + return err; + if (tb[IFLA_ADDRESS]) { spin_lock_bh(&br->lock); br_stp_change_bridge_id(br, nla_data(tb[IFLA_ADDRESS])); spin_unlock_bh(&br->lock); } - err = register_netdevice(dev); - if (err) - return err; - err = br_changelink(dev, tb, data, extack); if (err) - unregister_netdevice(dev); + br_dev_delete(dev, NULL); + return err; } From dd9a2648b3e35c2369f580215d916baf7e23253a Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Tue, 19 Dec 2017 11:27:56 -0600 Subject: [PATCH 0889/1013] net: Fix double free and memory corruption in get_net_ns_by_id() [ Upstream commit 21b5944350052d2583e82dd59b19a9ba94a007f0 ] (I can trivially verify that that idr_remove in cleanup_net happens after the network namespace count has dropped to zero --EWB) Function get_net_ns_by_id() does not check for net::count after it has found a peer in netns_ids idr. It may dereference a peer, after its count has already been finaly decremented. This leads to double free and memory corruption: put_net(peer) rtnl_lock() atomic_dec_and_test(&peer->count) [count=0] ... __put_net(peer) get_net_ns_by_id(net, id) spin_lock(&cleanup_list_lock) list_add(&net->cleanup_list, &cleanup_list) spin_unlock(&cleanup_list_lock) queue_work() peer = idr_find(&net->netns_ids, id) | get_net(peer) [count=1] | ... | (use after final put) v ... cleanup_net() ... spin_lock(&cleanup_list_lock) ... list_replace_init(&cleanup_list, ..) ... spin_unlock(&cleanup_list_lock) ... ... ... ... put_net(peer) ... atomic_dec_and_test(&peer->count) [count=0] ... spin_lock(&cleanup_list_lock) ... list_add(&net->cleanup_list, &cleanup_list) ... spin_unlock(&cleanup_list_lock) ... queue_work() ... rtnl_unlock() rtnl_lock() ... for_each_net(tmp) { ... id = __peernet2id(tmp, peer) ... spin_lock_irq(&tmp->nsid_lock) ... idr_remove(&tmp->netns_ids, id) ... ... ... net_drop_ns() ... net_free(peer) ... } ... | v cleanup_net() ... (Second free of peer) Also, put_net() on the right cpu may reorder with left's cpu list_replace_init(&cleanup_list, ..), and then cleanup_list will be corrupted. Since cleanup_net() is executed in worker thread, while put_net(peer) can happen everywhere, there should be enough time for concurrent get_net_ns_by_id() to pick the peer up, and the race does not seem to be unlikely. The patch fixes the problem in standard way. (Also, there is possible problem in peernet2id_alloc(), which requires check for net::count under nsid_lock and maybe_get_net(peer), but in current stable kernel it's used under rtnl_lock() and it has to be safe. Openswitch begun to use peernet2id_alloc(), and possibly it should be fixed too. While this is not in stable kernel yet, so I'll send a separate message to netdev@ later). Cc: Nicolas Dichtel Signed-off-by: Kirill Tkhai Fixes: 0c7aecd4bde4 "netns: add rtnl cmd to add and get peer netns ids" Reviewed-by: Andrey Ryabinin Reviewed-by: "Eric W. Biederman" Signed-off-by: Eric W. Biederman Reviewed-by: Eric Dumazet Acked-by: Nicolas Dichtel Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/net_namespace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c index 6cfdc7c84c48..0dd6359e5924 100644 --- a/net/core/net_namespace.c +++ b/net/core/net_namespace.c @@ -266,7 +266,7 @@ struct net *get_net_ns_by_id(struct net *net, int id) spin_lock_bh(&net->nsid_lock); peer = idr_find(&net->netns_ids, id); if (peer) - get_net(peer); + peer = maybe_get_net(peer); spin_unlock_bh(&net->nsid_lock); rcu_read_unlock(); From 003514ffb447120d1a5997827ad5226d7b5a82e1 Mon Sep 17 00:00:00 2001 From: Grygorii Strashko Date: Wed, 20 Dec 2017 18:45:10 -0600 Subject: [PATCH 0890/1013] net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround [ Upstream commit c1a8d0a3accf64a014d605e6806ce05d1c17adf1 ] Under some circumstances driver will perform PHY reset in ksz9031_read_status() to fix autoneg failure case (idle error count = 0xFF). When this happens ksz9031 will not detect link status change any more when connecting to Netgear 1G switch (link can be recovered sometimes by restarting netdevice "ifconfig down up"). Reproduced with TI am572x board equipped with ksz9031 PHY while connecting to Netgear 1G switch. Fix the issue by reconfiguring autonegotiation after PHY reset in ksz9031_read_status(). Fixes: d2fd719bcb0e ("net/phy: micrel: Add workaround for bad autoneg") Signed-off-by: Grygorii Strashko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/micrel.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index fdb43dd9b5cd..6c45ff650ec7 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -622,6 +622,7 @@ static int ksz9031_read_status(struct phy_device *phydev) phydev->link = 0; if (phydev->drv->config_intr && phy_interrupt_is_valid(phydev)) phydev->drv->config_intr(phydev); + return genphy_config_aneg(phydev); } return 0; From 265ba7a046c05f67f80c2540ec3212dafe5b3673 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Wed, 13 Dec 2017 14:41:06 -0500 Subject: [PATCH 0891/1013] sock: free skb in skb_complete_tx_timestamp on error [ Upstream commit 35b99dffc3f710cafceee6c8c6ac6a98eb2cb4bf ] skb_complete_tx_timestamp must ingest the skb it is passed. Call kfree_skb if the skb cannot be enqueued. Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl") Fixes: 9ac25fc06375 ("net: fix socket refcounting in skb_complete_tx_timestamp()") Reported-by: Richard Cochran Signed-off-by: Willem de Bruijn Reviewed-by: Eric Dumazet Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index e140ba49b30a..623143d7844d 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4296,7 +4296,7 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, struct sock *sk = skb->sk; if (!skb_may_tx_timestamp(sk, false)) - return; + goto err; /* Take a reference to prevent skb_orphan() from freeing the socket, * but only if the socket refcount is not zero. @@ -4305,7 +4305,11 @@ void skb_complete_tx_timestamp(struct sk_buff *skb, *skb_hwtstamps(skb) = *hwtstamps; __skb_complete_tx_timestamp(skb, sk, SCM_TSTAMP_SND, false); sock_put(sk); + return; } + +err: + kfree_skb(skb); } EXPORT_SYMBOL_GPL(skb_complete_tx_timestamp); From f35318b289446015ce2d2cbfd240ba47bda6f187 Mon Sep 17 00:00:00 2001 From: Yousuk Seung Date: Thu, 7 Dec 2017 13:41:34 -0800 Subject: [PATCH 0892/1013] tcp: invalidate rate samples during SACK reneging [ Upstream commit d4761754b4fb2ef8d9a1e9d121c4bec84e1fe292 ] Mark tcp_sock during a SACK reneging event and invalidate rate samples while marked. Such rate samples may overestimate bw by including packets that were SACKed before reneging. < ack 6001 win 10000 sack 7001:38001 < ack 7001 win 0 sack 8001:38001 // Reneg detected > seq 7001:8001 // RTO, SACK cleared. < ack 38001 win 10000 In above example the rate sample taken after the last ack will count 7001-38001 as delivered while the actual delivery rate likely could be much lower i.e. 7001-8001. This patch adds a new field tcp_sock.sack_reneg and marks it when we declare SACK reneging and entering TCP_CA_Loss, and unmarks it after the last rate sample was taken before moving back to TCP_CA_Open. This patch also invalidates rate samples taken while tcp_sock.is_sack_reneg is set. Fixes: b9f64820fb22 ("tcp: track data delivery rate for a TCP connection") Signed-off-by: Yousuk Seung Signed-off-by: Neal Cardwell Signed-off-by: Yuchung Cheng Acked-by: Soheil Hassas Yeganeh Acked-by: Eric Dumazet Acked-by: Priyaranjan Jha Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- include/linux/tcp.h | 3 ++- include/net/tcp.h | 2 +- net/ipv4/tcp.c | 1 + net/ipv4/tcp_input.c | 10 ++++++++-- net/ipv4/tcp_rate.c | 10 +++++++--- 5 files changed, 19 insertions(+), 7 deletions(-) diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 4aa40ef02d32..e8418fc77a43 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -214,7 +214,8 @@ struct tcp_sock { u8 chrono_type:2, /* current chronograph type */ rate_app_limited:1, /* rate_{delivered,interval_us} limited? */ fastopen_connect:1, /* FASTOPEN_CONNECT sockopt */ - unused:4; + is_sack_reneg:1, /* in recovery from loss with SACK reneg? */ + unused:3; u8 nonagle : 4,/* Disable Nagle algorithm? */ thin_lto : 1,/* Use linear timeouts for thin streams */ unused1 : 1, diff --git a/include/net/tcp.h b/include/net/tcp.h index 6ced69940f5c..0a13574134b8 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1085,7 +1085,7 @@ void tcp_rate_skb_sent(struct sock *sk, struct sk_buff *skb); void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb, struct rate_sample *rs); void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost, - struct rate_sample *rs); + bool is_sack_reneg, struct rate_sample *rs); void tcp_rate_check_app_limited(struct sock *sk); /* These functions determine how the current flow behaves in respect of SACK diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5091402720ab..a0c72b09cefc 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2356,6 +2356,7 @@ int tcp_disconnect(struct sock *sk, int flags) tp->snd_cwnd_cnt = 0; tp->window_clamp = 0; tcp_set_ca_state(sk, TCP_CA_Open); + tp->is_sack_reneg = 0; tcp_clear_retrans(tp); inet_csk_delack_init(sk); /* Initialize rcv_mss to TCP_MIN_MSS to avoid division by 0 diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index c5447b9f8517..a965a38211ba 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1975,6 +1975,8 @@ void tcp_enter_loss(struct sock *sk) NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSACKRENEGING); tp->sacked_out = 0; tp->fackets_out = 0; + /* Mark SACK reneging until we recover from this loss event. */ + tp->is_sack_reneg = 1; } tcp_clear_all_retrans_hints(tp); @@ -2428,6 +2430,7 @@ static bool tcp_try_undo_recovery(struct sock *sk) return true; } tcp_set_ca_state(sk, TCP_CA_Open); + tp->is_sack_reneg = 0; return false; } @@ -2459,8 +2462,10 @@ static bool tcp_try_undo_loss(struct sock *sk, bool frto_undo) NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPSPURIOUSRTOS); inet_csk(sk)->icsk_retransmits = 0; - if (frto_undo || tcp_is_sack(tp)) + if (frto_undo || tcp_is_sack(tp)) { tcp_set_ca_state(sk, TCP_CA_Open); + tp->is_sack_reneg = 0; + } return true; } return false; @@ -3551,6 +3556,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) struct tcp_sacktag_state sack_state; struct rate_sample rs = { .prior_delivered = 0 }; u32 prior_snd_una = tp->snd_una; + bool is_sack_reneg = tp->is_sack_reneg; u32 ack_seq = TCP_SKB_CB(skb)->seq; u32 ack = TCP_SKB_CB(skb)->ack_seq; bool is_dupack = false; @@ -3666,7 +3672,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag) delivered = tp->delivered - delivered; /* freshly ACKed or SACKed */ lost = tp->lost - lost; /* freshly marked lost */ - tcp_rate_gen(sk, delivered, lost, sack_state.rate); + tcp_rate_gen(sk, delivered, lost, is_sack_reneg, sack_state.rate); tcp_cong_control(sk, ack, delivered, flag, sack_state.rate); tcp_xmit_recovery(sk, rexmit); return 1; diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c index 3330a370d306..c61240e43923 100644 --- a/net/ipv4/tcp_rate.c +++ b/net/ipv4/tcp_rate.c @@ -106,7 +106,7 @@ void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff *skb, /* Update the connection delivery information and generate a rate sample. */ void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost, - struct rate_sample *rs) + bool is_sack_reneg, struct rate_sample *rs) { struct tcp_sock *tp = tcp_sk(sk); u32 snd_us, ack_us; @@ -124,8 +124,12 @@ void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost, rs->acked_sacked = delivered; /* freshly ACKed or SACKed */ rs->losses = lost; /* freshly marked lost */ - /* Return an invalid sample if no timing information is available. */ - if (!rs->prior_mstamp) { + /* Return an invalid sample if no timing information is available or + * in recovery from loss with SACK reneging. Rate samples taken during + * a SACK reneging event may overestimate bw by including packets that + * were SACKed before the reneg. + */ + if (!rs->prior_mstamp || is_sack_reneg) { rs->delivered = -1; rs->interval_us = -1; return; From 3ddcb727c71779c23097d440355c4abe27379558 Mon Sep 17 00:00:00 2001 From: Eran Ben Elisha Date: Mon, 13 Nov 2017 10:11:27 +0200 Subject: [PATCH 0893/1013] net/mlx5: Fix rate limit packet pacing naming and struct [ Upstream commit 37e92a9d4fe38dc3e7308913575983a6a088c8d4 ] In mlx5_ifc, struct size was not complete, and thus driver was sending garbage after the last defined field. Fixed it by adding reserved field to complete the struct size. In addition, rename all set_rate_limit to set_pp_rate_limit to be compliant with the Firmware <-> Driver definition. Fixes: 7486216b3a0b ("{net,IB}/mlx5: mlx5_ifc updates") Fixes: 1466cc5b23d1 ("net/mlx5: Rate limit tables support") Signed-off-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 ++-- drivers/net/ethernet/mellanox/mlx5/core/rl.c | 22 +++++++++---------- include/linux/mlx5/mlx5_ifc.h | 8 ++++--- 3 files changed, 18 insertions(+), 16 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 1fffdebbc9e8..e9a1fbcc4adf 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -362,7 +362,7 @@ static int mlx5_internal_err_ret_value(struct mlx5_core_dev *dev, u16 op, case MLX5_CMD_OP_QUERY_VPORT_COUNTER: case MLX5_CMD_OP_ALLOC_Q_COUNTER: case MLX5_CMD_OP_QUERY_Q_COUNTER: - case MLX5_CMD_OP_SET_RATE_LIMIT: + case MLX5_CMD_OP_SET_PP_RATE_LIMIT: case MLX5_CMD_OP_QUERY_RATE_LIMIT: case MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT: case MLX5_CMD_OP_QUERY_SCHEDULING_ELEMENT: @@ -505,7 +505,7 @@ const char *mlx5_command_str(int command) MLX5_COMMAND_STR_CASE(ALLOC_Q_COUNTER); MLX5_COMMAND_STR_CASE(DEALLOC_Q_COUNTER); MLX5_COMMAND_STR_CASE(QUERY_Q_COUNTER); - MLX5_COMMAND_STR_CASE(SET_RATE_LIMIT); + MLX5_COMMAND_STR_CASE(SET_PP_RATE_LIMIT); MLX5_COMMAND_STR_CASE(QUERY_RATE_LIMIT); MLX5_COMMAND_STR_CASE(CREATE_SCHEDULING_ELEMENT); MLX5_COMMAND_STR_CASE(DESTROY_SCHEDULING_ELEMENT); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/rl.c b/drivers/net/ethernet/mellanox/mlx5/core/rl.c index e651e4c02867..d3c33e9eea72 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/rl.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/rl.c @@ -125,16 +125,16 @@ static struct mlx5_rl_entry *find_rl_entry(struct mlx5_rl_table *table, return ret_entry; } -static int mlx5_set_rate_limit_cmd(struct mlx5_core_dev *dev, +static int mlx5_set_pp_rate_limit_cmd(struct mlx5_core_dev *dev, u32 rate, u16 index) { - u32 in[MLX5_ST_SZ_DW(set_rate_limit_in)] = {0}; - u32 out[MLX5_ST_SZ_DW(set_rate_limit_out)] = {0}; + u32 in[MLX5_ST_SZ_DW(set_pp_rate_limit_in)] = {0}; + u32 out[MLX5_ST_SZ_DW(set_pp_rate_limit_out)] = {0}; - MLX5_SET(set_rate_limit_in, in, opcode, - MLX5_CMD_OP_SET_RATE_LIMIT); - MLX5_SET(set_rate_limit_in, in, rate_limit_index, index); - MLX5_SET(set_rate_limit_in, in, rate_limit, rate); + MLX5_SET(set_pp_rate_limit_in, in, opcode, + MLX5_CMD_OP_SET_PP_RATE_LIMIT); + MLX5_SET(set_pp_rate_limit_in, in, rate_limit_index, index); + MLX5_SET(set_pp_rate_limit_in, in, rate_limit, rate); return mlx5_cmd_exec(dev, in, sizeof(in), out, sizeof(out)); } @@ -173,7 +173,7 @@ int mlx5_rl_add_rate(struct mlx5_core_dev *dev, u32 rate, u16 *index) entry->refcount++; } else { /* new rate limit */ - err = mlx5_set_rate_limit_cmd(dev, rate, entry->index); + err = mlx5_set_pp_rate_limit_cmd(dev, rate, entry->index); if (err) { mlx5_core_err(dev, "Failed configuring rate: %u (%d)\n", rate, err); @@ -209,7 +209,7 @@ void mlx5_rl_remove_rate(struct mlx5_core_dev *dev, u32 rate) entry->refcount--; if (!entry->refcount) { /* need to remove rate */ - mlx5_set_rate_limit_cmd(dev, 0, entry->index); + mlx5_set_pp_rate_limit_cmd(dev, 0, entry->index); entry->rate = 0; } @@ -262,8 +262,8 @@ void mlx5_cleanup_rl_table(struct mlx5_core_dev *dev) /* Clear all configured rates */ for (i = 0; i < table->max_size; i++) if (table->rl_entry[i].rate) - mlx5_set_rate_limit_cmd(dev, 0, - table->rl_entry[i].index); + mlx5_set_pp_rate_limit_cmd(dev, 0, + table->rl_entry[i].index); kfree(dev->priv.rl_table.rl_entry); } diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 69772347f866..c8091f06eaa4 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -147,7 +147,7 @@ enum { MLX5_CMD_OP_ALLOC_Q_COUNTER = 0x771, MLX5_CMD_OP_DEALLOC_Q_COUNTER = 0x772, MLX5_CMD_OP_QUERY_Q_COUNTER = 0x773, - MLX5_CMD_OP_SET_RATE_LIMIT = 0x780, + MLX5_CMD_OP_SET_PP_RATE_LIMIT = 0x780, MLX5_CMD_OP_QUERY_RATE_LIMIT = 0x781, MLX5_CMD_OP_CREATE_SCHEDULING_ELEMENT = 0x782, MLX5_CMD_OP_DESTROY_SCHEDULING_ELEMENT = 0x783, @@ -7233,7 +7233,7 @@ struct mlx5_ifc_add_vxlan_udp_dport_in_bits { u8 vxlan_udp_port[0x10]; }; -struct mlx5_ifc_set_rate_limit_out_bits { +struct mlx5_ifc_set_pp_rate_limit_out_bits { u8 status[0x8]; u8 reserved_at_8[0x18]; @@ -7242,7 +7242,7 @@ struct mlx5_ifc_set_rate_limit_out_bits { u8 reserved_at_40[0x40]; }; -struct mlx5_ifc_set_rate_limit_in_bits { +struct mlx5_ifc_set_pp_rate_limit_in_bits { u8 opcode[0x10]; u8 reserved_at_10[0x10]; @@ -7255,6 +7255,8 @@ struct mlx5_ifc_set_rate_limit_in_bits { u8 reserved_at_60[0x20]; u8 rate_limit[0x20]; + + u8 reserved_at_a0[0x160]; }; struct mlx5_ifc_access_register_out_bits { From 2dc5654e6fbc46161c56058081258bd3ad8d1ab0 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Thu, 23 Nov 2017 13:52:28 +0200 Subject: [PATCH 0894/1013] net/mlx5e: Fix possible deadlock of VXLAN lock [ Upstream commit 6323514116404cc651df1b7fffa1311ddf8ce647 ] mlx5e_vxlan_lookup_port is called both from mlx5e_add_vxlan_port (user context) and mlx5e_features_check (softirq), but the lock acquired does not disable bottom half and might result in deadlock. Fix it by simply replacing spin_lock() with spin_lock_bh(). While at it, replace all unnecessary spin_lock_irq() to spin_lock_bh(). lockdep's WARNING: inconsistent lock state [ 654.028136] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 654.028229] swapper/5/0 [HC0[0]:SC1[9]:HE1:SE0] takes: [ 654.028321] (&(&vxlan_db->lock)->rlock){+.?.}, at: [] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core] [ 654.028528] {SOFTIRQ-ON-W} state was registered at: [ 654.028607] _raw_spin_lock+0x3c/0x70 [ 654.028689] mlx5e_vxlan_lookup_port+0x1e/0x50 [mlx5_core] [ 654.028794] mlx5e_vxlan_add_port+0x2e/0x120 [mlx5_core] [ 654.028878] process_one_work+0x1e9/0x640 [ 654.028942] worker_thread+0x4a/0x3f0 [ 654.029002] kthread+0x141/0x180 [ 654.029056] ret_from_fork+0x24/0x30 [ 654.029114] irq event stamp: 579088 [ 654.029174] hardirqs last enabled at (579088): [] ip6_finish_output2+0x49a/0x8c0 [ 654.029309] hardirqs last disabled at (579087): [] ip6_finish_output2+0x44e/0x8c0 [ 654.029446] softirqs last enabled at (579030): [] irq_enter+0x6d/0x80 [ 654.029567] softirqs last disabled at (579031): [] irq_exit+0xb5/0xc0 [ 654.029684] other info that might help us debug this: [ 654.029781] Possible unsafe locking scenario: [ 654.029868] CPU0 [ 654.029908] ---- [ 654.029947] lock(&(&vxlan_db->lock)->rlock); [ 654.030045] [ 654.030090] lock(&(&vxlan_db->lock)->rlock); [ 654.030162] *** DEADLOCK *** Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- .../net/ethernet/mellanox/mlx5/core/vxlan.c | 20 +++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c index 07a9ba6cfc70..f8238275759f 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c @@ -71,9 +71,9 @@ struct mlx5e_vxlan *mlx5e_vxlan_lookup_port(struct mlx5e_priv *priv, u16 port) struct mlx5e_vxlan_db *vxlan_db = &priv->vxlan; struct mlx5e_vxlan *vxlan; - spin_lock(&vxlan_db->lock); + spin_lock_bh(&vxlan_db->lock); vxlan = radix_tree_lookup(&vxlan_db->tree, port); - spin_unlock(&vxlan_db->lock); + spin_unlock_bh(&vxlan_db->lock); return vxlan; } @@ -100,9 +100,9 @@ static void mlx5e_vxlan_add_port(struct work_struct *work) vxlan->udp_port = port; - spin_lock_irq(&vxlan_db->lock); + spin_lock_bh(&vxlan_db->lock); err = radix_tree_insert(&vxlan_db->tree, vxlan->udp_port, vxlan); - spin_unlock_irq(&vxlan_db->lock); + spin_unlock_bh(&vxlan_db->lock); if (err) goto err_free; @@ -121,9 +121,9 @@ static void __mlx5e_vxlan_core_del_port(struct mlx5e_priv *priv, u16 port) struct mlx5e_vxlan_db *vxlan_db = &priv->vxlan; struct mlx5e_vxlan *vxlan; - spin_lock_irq(&vxlan_db->lock); + spin_lock_bh(&vxlan_db->lock); vxlan = radix_tree_delete(&vxlan_db->tree, port); - spin_unlock_irq(&vxlan_db->lock); + spin_unlock_bh(&vxlan_db->lock); if (!vxlan) return; @@ -171,12 +171,12 @@ void mlx5e_vxlan_cleanup(struct mlx5e_priv *priv) struct mlx5e_vxlan *vxlan; unsigned int port = 0; - spin_lock_irq(&vxlan_db->lock); + spin_lock_bh(&vxlan_db->lock); while (radix_tree_gang_lookup(&vxlan_db->tree, (void **)&vxlan, port, 1)) { port = vxlan->udp_port; - spin_unlock_irq(&vxlan_db->lock); + spin_unlock_bh(&vxlan_db->lock); __mlx5e_vxlan_core_del_port(priv, (u16)port); - spin_lock_irq(&vxlan_db->lock); + spin_lock_bh(&vxlan_db->lock); } - spin_unlock_irq(&vxlan_db->lock); + spin_unlock_bh(&vxlan_db->lock); } From 597181622e649b30a4c4d6bec4816d2f8987d25a Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Tue, 21 Nov 2017 17:49:36 +0200 Subject: [PATCH 0895/1013] net/mlx5e: Fix features check of IPv6 traffic [ Upstream commit 2989ad1ec03021ee6d2193c35414f1d970a243de ] The assumption that the next header field contains the transport protocol is wrong for IPv6 packets with extension headers. Instead, we should look the inner-most next header field in the buffer. This will fix TSO offload for tunnels over IPv6 with extension headers. Performance testing: 19.25x improvement, cool! Measuring bandwidth of 16 threads TCP traffic over IPv6 GRE tap. CPU: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz NIC: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] TSO: Enabled Before: 4,926.24 Mbps Now : 94,827.91 Mbps Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index cc11bbbd0309..517727f277c5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3554,6 +3554,7 @@ static netdev_features_t mlx5e_tunnel_features_check(struct mlx5e_priv *priv, struct sk_buff *skb, netdev_features_t features) { + unsigned int offset = 0; struct udphdr *udph; u8 proto; u16 port; @@ -3563,7 +3564,7 @@ static netdev_features_t mlx5e_tunnel_features_check(struct mlx5e_priv *priv, proto = ip_hdr(skb)->protocol; break; case htons(ETH_P_IPV6): - proto = ipv6_hdr(skb)->nexthdr; + proto = ipv6_find_hdr(skb, &offset, -1, NULL, NULL); break; default: goto out; From c4d0e614c151f102967ff38bdf7c4372c146a5a6 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Sun, 3 Dec 2017 13:58:50 +0200 Subject: [PATCH 0896/1013] net/mlx5e: Add refcount to VXLAN structure [ Upstream commit 23f4cc2cd9ed92570647220aca60d0197d8c1fa9 ] A refcount mechanism must be implemented in order to prevent unwanted scenarios such as: - Open an IPv4 VXLAN interface - Open an IPv6 VXLAN interface (different socket) - Remove one of the interfaces With current implementation, the UDP port will be removed from our VXLAN database and turn off the offloads for the other interface, which is still active. The reference count mechanism will only allow UDP port removals once all consumers are gone. Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- .../net/ethernet/mellanox/mlx5/core/vxlan.c | 50 ++++++++++--------- .../net/ethernet/mellanox/mlx5/core/vxlan.h | 1 + 2 files changed, 28 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c index f8238275759f..25f782344667 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c @@ -88,8 +88,11 @@ static void mlx5e_vxlan_add_port(struct work_struct *work) struct mlx5e_vxlan *vxlan; int err; - if (mlx5e_vxlan_lookup_port(priv, port)) + vxlan = mlx5e_vxlan_lookup_port(priv, port); + if (vxlan) { + atomic_inc(&vxlan->refcount); goto free_work; + } if (mlx5e_vxlan_core_add_port_cmd(priv->mdev, port)) goto free_work; @@ -99,6 +102,7 @@ static void mlx5e_vxlan_add_port(struct work_struct *work) goto err_delete_port; vxlan->udp_port = port; + atomic_set(&vxlan->refcount, 1); spin_lock_bh(&vxlan_db->lock); err = radix_tree_insert(&vxlan_db->tree, vxlan->udp_port, vxlan); @@ -116,32 +120,33 @@ static void mlx5e_vxlan_add_port(struct work_struct *work) kfree(vxlan_work); } -static void __mlx5e_vxlan_core_del_port(struct mlx5e_priv *priv, u16 port) +static void mlx5e_vxlan_del_port(struct work_struct *work) { + struct mlx5e_vxlan_work *vxlan_work = + container_of(work, struct mlx5e_vxlan_work, work); + struct mlx5e_priv *priv = vxlan_work->priv; struct mlx5e_vxlan_db *vxlan_db = &priv->vxlan; + u16 port = vxlan_work->port; struct mlx5e_vxlan *vxlan; + bool remove = false; spin_lock_bh(&vxlan_db->lock); - vxlan = radix_tree_delete(&vxlan_db->tree, port); - spin_unlock_bh(&vxlan_db->lock); - + vxlan = radix_tree_lookup(&vxlan_db->tree, port); if (!vxlan) - return; - - mlx5e_vxlan_core_del_port_cmd(priv->mdev, vxlan->udp_port); - - kfree(vxlan); -} + goto out_unlock; -static void mlx5e_vxlan_del_port(struct work_struct *work) -{ - struct mlx5e_vxlan_work *vxlan_work = - container_of(work, struct mlx5e_vxlan_work, work); - struct mlx5e_priv *priv = vxlan_work->priv; - u16 port = vxlan_work->port; + if (atomic_dec_and_test(&vxlan->refcount)) { + radix_tree_delete(&vxlan_db->tree, port); + remove = true; + } - __mlx5e_vxlan_core_del_port(priv, port); +out_unlock: + spin_unlock_bh(&vxlan_db->lock); + if (remove) { + mlx5e_vxlan_core_del_port_cmd(priv->mdev, port); + kfree(vxlan); + } kfree(vxlan_work); } @@ -171,12 +176,11 @@ void mlx5e_vxlan_cleanup(struct mlx5e_priv *priv) struct mlx5e_vxlan *vxlan; unsigned int port = 0; - spin_lock_bh(&vxlan_db->lock); + /* Lockless since we are the only radix-tree consumers, wq is disabled */ while (radix_tree_gang_lookup(&vxlan_db->tree, (void **)&vxlan, port, 1)) { port = vxlan->udp_port; - spin_unlock_bh(&vxlan_db->lock); - __mlx5e_vxlan_core_del_port(priv, (u16)port); - spin_lock_bh(&vxlan_db->lock); + radix_tree_delete(&vxlan_db->tree, port); + mlx5e_vxlan_core_del_port_cmd(priv->mdev, port); + kfree(vxlan); } - spin_unlock_bh(&vxlan_db->lock); } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.h b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.h index 5def12c048e3..5ef6ae7d568a 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.h @@ -36,6 +36,7 @@ #include "en.h" struct mlx5e_vxlan { + atomic_t refcount; u16 udp_port; }; From 999755ec40a6206d0c9f3edfffe2b00a7a66a6e9 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Mon, 4 Dec 2017 09:57:43 +0200 Subject: [PATCH 0897/1013] net/mlx5e: Prevent possible races in VXLAN control flow [ Upstream commit 0c1cc8b2215f5122ca614b5adca60346018758c3 ] When calling add/remove VXLAN port, a lock must be held in order to prevent race scenarios when more than one add/remove happens at the same time. Fix by holding our state_lock (mutex) as done by all other parts of the driver. Note that the spinlock protecting the radix-tree is still needed in order to synchronize radix-tree access from softirq context. Fixes: b3f63c3d5e2c ("net/mlx5e: Add netdev support for VXLAN tunneling") Signed-off-by: Gal Pressman Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/vxlan.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c index 25f782344667..2f74953e4561 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/vxlan.c @@ -88,6 +88,7 @@ static void mlx5e_vxlan_add_port(struct work_struct *work) struct mlx5e_vxlan *vxlan; int err; + mutex_lock(&priv->state_lock); vxlan = mlx5e_vxlan_lookup_port(priv, port); if (vxlan) { atomic_inc(&vxlan->refcount); @@ -117,6 +118,7 @@ static void mlx5e_vxlan_add_port(struct work_struct *work) err_delete_port: mlx5e_vxlan_core_del_port_cmd(priv->mdev, port); free_work: + mutex_unlock(&priv->state_lock); kfree(vxlan_work); } @@ -130,6 +132,7 @@ static void mlx5e_vxlan_del_port(struct work_struct *work) struct mlx5e_vxlan *vxlan; bool remove = false; + mutex_lock(&priv->state_lock); spin_lock_bh(&vxlan_db->lock); vxlan = radix_tree_lookup(&vxlan_db->tree, port); if (!vxlan) @@ -147,6 +150,7 @@ static void mlx5e_vxlan_del_port(struct work_struct *work) mlx5e_vxlan_core_del_port_cmd(priv->mdev, port); kfree(vxlan); } + mutex_unlock(&priv->state_lock); kfree(vxlan_work); } From bf070305213031e1300070d10334ac012116d470 Mon Sep 17 00:00:00 2001 From: Moni Shoua Date: Mon, 4 Dec 2017 08:59:25 +0200 Subject: [PATCH 0898/1013] net/mlx5: Fix error flow in CREATE_QP command [ Upstream commit dbff26e44dc3ec4de6578733b054a0114652a764 ] In error flow, when DESTROY_QP command should be executed, the wrong mailbox was set with data, not the one that is written to hardware, Fix that. Fixes: 09a7d9eca1a6 '{net,IB}/mlx5: QP/XRCD commands via mlx5 ifc' Signed-off-by: Moni Shoua Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/qp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/qp.c b/drivers/net/ethernet/mellanox/mlx5/core/qp.c index db9e665ab104..889130edb715 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/qp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/qp.c @@ -213,8 +213,8 @@ int mlx5_core_create_qp(struct mlx5_core_dev *dev, err_cmd: memset(din, 0, sizeof(din)); memset(dout, 0, sizeof(dout)); - MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP); - MLX5_SET(destroy_qp_in, in, qpn, qp->qpn); + MLX5_SET(destroy_qp_in, din, opcode, MLX5_CMD_OP_DESTROY_QP); + MLX5_SET(destroy_qp_in, din, qpn, qp->qpn); mlx5_cmd_exec(dev, din, sizeof(din), dout, sizeof(dout)); return err; } From a6cc63e125ffb3ae9f6b4e2b4642ddea9a932b46 Mon Sep 17 00:00:00 2001 From: Eric Garver Date: Wed, 20 Dec 2017 15:09:22 -0500 Subject: [PATCH 0899/1013] openvswitch: Fix pop_vlan action for double tagged frames [ Upstream commit c48e74736fccf25fb32bb015426359e1c2016e3b ] skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver Reviewed-by: Jiri Benc Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/openvswitch/flow.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c index cfb652a4e007..dbe1079a1651 100644 --- a/net/openvswitch/flow.c +++ b/net/openvswitch/flow.c @@ -532,6 +532,7 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) return -EINVAL; skb_reset_network_header(skb); + key->eth.type = skb->protocol; } else { eth = eth_hdr(skb); ether_addr_copy(key->eth.src, eth->h_source); @@ -545,15 +546,23 @@ static int key_extract(struct sk_buff *skb, struct sw_flow_key *key) if (unlikely(parse_vlan(skb, key))) return -ENOMEM; - skb->protocol = parse_ethertype(skb); - if (unlikely(skb->protocol == htons(0))) + key->eth.type = parse_ethertype(skb); + if (unlikely(key->eth.type == htons(0))) return -ENOMEM; + /* Multiple tagged packets need to retain TPID to satisfy + * skb_vlan_pop(), which will later shift the ethertype into + * skb->protocol. + */ + if (key->eth.cvlan.tci & htons(VLAN_TAG_PRESENT)) + skb->protocol = key->eth.cvlan.tpid; + else + skb->protocol = key->eth.type; + skb_reset_network_header(skb); __skb_push(skb, skb->data - skb_mac_header(skb)); } skb_reset_mac_len(skb); - key->eth.type = skb->protocol; /* Network layer. */ if (key->eth.type == htons(ETH_P_IP)) { From 701768dc9a1030d3f02b9b3e9b4690d15f637dad Mon Sep 17 00:00:00 2001 From: Bert Kenward Date: Thu, 7 Dec 2017 17:18:58 +0000 Subject: [PATCH 0900/1013] sfc: pass valid pointers from efx_enqueue_unwind [ Upstream commit d4a7a8893d4cdbc89d79ac4aa704bf8d4b67b368 ] The bytes_compl and pkts_compl pointers passed to efx_dequeue_buffers cannot be NULL. Add a paranoid warning to check this condition and fix the one case where they were NULL. efx_enqueue_unwind() is called very rarely, during error handling. Without this fix it would fail with a NULL pointer dereference in efx_dequeue_buffer, with efx_enqueue_skb in the call stack. Fixes: e9117e5099ea ("sfc: Firmware-Assisted TSO version 2") Reported-by: Jarod Wilson Signed-off-by: Bert Kenward Tested-by: Jarod Wilson Acked-by: Jarod Wilson Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/sfc/tx.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c index 32bf1fecf864..9b85cbd5a231 100644 --- a/drivers/net/ethernet/sfc/tx.c +++ b/drivers/net/ethernet/sfc/tx.c @@ -77,6 +77,7 @@ static void efx_dequeue_buffer(struct efx_tx_queue *tx_queue, } if (buffer->flags & EFX_TX_BUF_SKB) { + EFX_WARN_ON_PARANOID(!pkts_compl || !bytes_compl); (*pkts_compl)++; (*bytes_compl) += buffer->skb->len; dev_consume_skb_any((struct sk_buff *)buffer->skb); @@ -426,12 +427,14 @@ static int efx_tx_map_data(struct efx_tx_queue *tx_queue, struct sk_buff *skb, static void efx_enqueue_unwind(struct efx_tx_queue *tx_queue) { struct efx_tx_buffer *buffer; + unsigned int bytes_compl = 0; + unsigned int pkts_compl = 0; /* Work backwards until we hit the original insert pointer value */ while (tx_queue->insert_count != tx_queue->write_count) { --tx_queue->insert_count; buffer = __efx_tx_queue_get_insert_buffer(tx_queue); - efx_dequeue_buffer(tx_queue, buffer, NULL, NULL); + efx_dequeue_buffer(tx_queue, buffer, &pkts_compl, &bytes_compl); } } From ff1ff3815c2483ac19bd6f926a535dd4cea27e2e Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Tue, 21 Nov 2017 17:37:46 -0800 Subject: [PATCH 0901/1013] net: dsa: bcm_sf2: Clear IDDQ_GLOBAL_PWR bit for PHY [ Upstream commit 4b52d010113e11006a389f2a8315167ede9e0b10 ] The PHY on BCM7278 has an additional bit that needs to be cleared: IDDQ_GLOBAL_PWR, without doing this, the PHY remains stuck in reset out of suspend/resume cycles. Fixes: 0fe9933804eb ("net: dsa: bcm_sf2: Add support for BCM7278 integrated switch") Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/dsa/bcm_sf2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c index d7b53d53c116..72d6ffbfd638 100644 --- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -167,7 +167,7 @@ static void bcm_sf2_gphy_enable_set(struct dsa_switch *ds, bool enable) reg = reg_readl(priv, REG_SPHY_CNTRL); if (enable) { reg |= PHY_RESET; - reg &= ~(EXT_PWR_DOWN | IDDQ_BIAS | CK25_DIS); + reg &= ~(EXT_PWR_DOWN | IDDQ_BIAS | IDDQ_GLOBAL_PWR | CK25_DIS); reg_writel(priv, reg, REG_SPHY_CNTRL); udelay(21); reg = reg_readl(priv, REG_SPHY_CNTRL); From f38ffe325b209f367c003bab291cf7e96cd1a6d9 Mon Sep 17 00:00:00 2001 From: Julian Wiedmann Date: Wed, 20 Dec 2017 18:07:18 +0100 Subject: [PATCH 0902/1013] s390/qeth: fix error handling in checksum cmd callback [ Upstream commit ad3cbf61332914711e5f506972b1dc9af8d62146 ] Make sure to check both return code fields before processing the response. Otherwise we risk operating on invalid data. Fixes: c9475369bd2b ("s390/qeth: rework RX/TX checksum offload") Signed-off-by: Julian Wiedmann Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/s390/net/qeth_core_main.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c index 291eb89ecb06..7c7a244b6684 100644 --- a/drivers/s390/net/qeth_core_main.c +++ b/drivers/s390/net/qeth_core_main.c @@ -5445,6 +5445,13 @@ int qeth_poll(struct napi_struct *napi, int budget) } EXPORT_SYMBOL_GPL(qeth_poll); +static int qeth_setassparms_inspect_rc(struct qeth_ipa_cmd *cmd) +{ + if (!cmd->hdr.return_code) + cmd->hdr.return_code = cmd->data.setassparms.hdr.return_code; + return cmd->hdr.return_code; +} + int qeth_setassparms_cb(struct qeth_card *card, struct qeth_reply *reply, unsigned long data) { @@ -6304,7 +6311,7 @@ static int qeth_ipa_checksum_run_cmd_cb(struct qeth_card *card, (struct qeth_checksum_cmd *)reply->param; QETH_CARD_TEXT(card, 4, "chkdoccb"); - if (cmd->hdr.return_code) + if (qeth_setassparms_inspect_rc(cmd)) return 0; memset(chksum_cb, 0, sizeof(*chksum_cb)); From 201c59bb7ba69fd6a41ac6606d8049b94fb26232 Mon Sep 17 00:00:00 2001 From: Xin Long Date: Sun, 10 Dec 2017 15:40:51 +0800 Subject: [PATCH 0903/1013] sctp: make sure stream nums can match optlen in sctp_setsockopt_reset_streams [ Upstream commit 2342b8d95bcae5946e1b9b8d58645f37500ef2e7 ] Now in sctp_setsockopt_reset_streams, it only does the check optlen < sizeof(*params) for optlen. But it's not enough, as params->srs_number_streams should also match optlen. If the streams in params->srs_stream_list are less than stream nums in params->srs_number_streams, later when dereferencing the stream list, it could cause a slab-out-of-bounds crash, as reported by syzbot. This patch is to fix it by also checking the stream numbers in sctp_setsockopt_reset_streams to make sure at least it's not greater than the streams in the list. Fixes: 7f9d68ac944e ("sctp: implement sender-side procedures for SSN Reset Request Parameter") Reported-by: Dmitry Vyukov Signed-off-by: Xin Long Acked-by: Marcelo Ricardo Leitner Acked-by: Neil Horman Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sctp/socket.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/net/sctp/socket.c b/net/sctp/socket.c index 1977238dc023..df806b8819aa 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -3874,13 +3874,17 @@ static int sctp_setsockopt_reset_streams(struct sock *sk, struct sctp_association *asoc; int retval = -EINVAL; - if (optlen < sizeof(struct sctp_reset_streams)) + if (optlen < sizeof(*params)) return -EINVAL; params = memdup_user(optval, optlen); if (IS_ERR(params)) return PTR_ERR(params); + if (params->srs_number_streams * sizeof(__u16) > + optlen - sizeof(*params)) + goto out; + asoc = sctp_id2assoc(sk, params->srs_assoc_id); if (!asoc) goto out; From 92ae8233467b3a19a50fb02a7ebe065c6de3df17 Mon Sep 17 00:00:00 2001 From: Parthasarathy Bhuvaragan Date: Thu, 28 Dec 2017 12:03:06 +0100 Subject: [PATCH 0904/1013] tipc: fix hanging poll() for stream sockets [ Upstream commit 517d7c79bdb39864e617960504bdc1aa560c75c6 ] In commit 42b531de17d2f6 ("tipc: Fix missing connection request handling"), we replaced unconditional wakeup() with condtional wakeup for clients with flags POLLIN | POLLRDNORM | POLLRDBAND. This breaks the applications which do a connect followed by poll with POLLOUT flag. These applications are not woken when the connection is ESTABLISHED and hence sleep forever. In this commit, we fix it by including the POLLOUT event for sockets in TIPC_CONNECTING state. Fixes: 42b531de17d2f6 ("tipc: Fix missing connection request handling") Acked-by: Jon Maloy Signed-off-by: Parthasarathy Bhuvaragan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/tipc/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/tipc/socket.c b/net/tipc/socket.c index d50edd6e0019..98a44ecb11e7 100644 --- a/net/tipc/socket.c +++ b/net/tipc/socket.c @@ -709,11 +709,11 @@ static unsigned int tipc_poll(struct file *file, struct socket *sock, switch (sk->sk_state) { case TIPC_ESTABLISHED: + case TIPC_CONNECTING: if (!tsk->cong_link_cnt && !tsk_conn_cong(tsk)) mask |= POLLOUT; /* fall thru' */ case TIPC_LISTEN: - case TIPC_CONNECTING: if (!skb_queue_empty(&sk->sk_receive_queue)) mask |= (POLLIN | POLLRDNORM); break; From bcc029ff5dafc68d00a5fceadd93763d2b43e0e3 Mon Sep 17 00:00:00 2001 From: Yuval Mintz Date: Fri, 15 Dec 2017 08:44:21 +0100 Subject: [PATCH 0905/1013] mlxsw: spectrum: Disable MAC learning for ovs port [ Upstream commit fccff0862838908d21eaf956d57e09c6c189f7c5 ] Learning is currently enabled for ports which are OVS slaves - even though OVS doesn't need this indication. Since we're not associating a fid with the port, HW would continuously notify driver of learned [& aged] MACs which would be logged as errors. Fixes: 2b94e58df58c ("mlxsw: spectrum: Allow ports to work under OVS master") Signed-off-by: Yuval Mintz Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index db38880f54b4..3ead7439821c 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -4164,6 +4164,7 @@ static int mlxsw_sp_port_stp_set(struct mlxsw_sp_port *mlxsw_sp_port, static int mlxsw_sp_port_ovs_join(struct mlxsw_sp_port *mlxsw_sp_port) { + u16 vid = 1; int err; err = mlxsw_sp_port_vp_mode_set(mlxsw_sp_port, true); @@ -4176,8 +4177,19 @@ static int mlxsw_sp_port_ovs_join(struct mlxsw_sp_port *mlxsw_sp_port) true, false); if (err) goto err_port_vlan_set; + + for (; vid <= VLAN_N_VID - 1; vid++) { + err = mlxsw_sp_port_vid_learning_set(mlxsw_sp_port, + vid, false); + if (err) + goto err_vid_learning_set; + } + return 0; +err_vid_learning_set: + for (vid--; vid >= 1; vid--) + mlxsw_sp_port_vid_learning_set(mlxsw_sp_port, vid, true); err_port_vlan_set: mlxsw_sp_port_stp_set(mlxsw_sp_port, false); err_port_stp_set: @@ -4187,6 +4199,12 @@ static int mlxsw_sp_port_ovs_join(struct mlxsw_sp_port *mlxsw_sp_port) static void mlxsw_sp_port_ovs_leave(struct mlxsw_sp_port *mlxsw_sp_port) { + u16 vid; + + for (vid = VLAN_N_VID - 1; vid >= 1; vid--) + mlxsw_sp_port_vid_learning_set(mlxsw_sp_port, + vid, true); + mlxsw_sp_port_vlan_set(mlxsw_sp_port, 2, VLAN_N_VID - 1, false, false); mlxsw_sp_port_stp_set(mlxsw_sp_port, false); From 583395a81f00a42f178c1d8ba3c4858d76c9f5b4 Mon Sep 17 00:00:00 2001 From: Wei Wang Date: Tue, 12 Dec 2017 16:28:58 -0800 Subject: [PATCH 0906/1013] tcp: fix potential underestimation on rcv_rtt [ Upstream commit 9ee11bd03cb1a5c3ca33c2bb70e7ed325f68890f ] When ms timestamp is used, current logic uses 1us in tcp_rcv_rtt_update() when the real rcv_rtt is within 1 - 999us. This could cause rcv_rtt underestimation. Fix it by always using a min value of 1ms if ms timestamp is used. Fixes: 645f4c6f2ebd ("tcp: switch rcv_rtt_est and rcvq_space to high resolution timestamps") Signed-off-by: Wei Wang Signed-off-by: Eric Dumazet Acked-by: Neal Cardwell Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_input.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index a965a38211ba..ff48ac654e5a 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -521,9 +521,6 @@ static void tcp_rcv_rtt_update(struct tcp_sock *tp, u32 sample, int win_dep) u32 new_sample = tp->rcv_rtt_est.rtt_us; long m = sample; - if (m == 0) - m = 1; - if (new_sample != 0) { /* If we sample in larger samples in the non-timestamp * case, we could grossly overestimate the RTT especially @@ -560,6 +557,8 @@ static inline void tcp_rcv_rtt_measure(struct tcp_sock *tp) if (before(tp->rcv_nxt, tp->rcv_rtt_est.seq)) return; delta_us = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcv_rtt_est.time); + if (!delta_us) + delta_us = 1; tcp_rcv_rtt_update(tp, delta_us, 1); new_measure: @@ -576,8 +575,11 @@ static inline void tcp_rcv_rtt_measure_ts(struct sock *sk, (TCP_SKB_CB(skb)->end_seq - TCP_SKB_CB(skb)->seq >= inet_csk(sk)->icsk_ack.rcv_mss)) { u32 delta = tcp_time_stamp(tp) - tp->rx_opt.rcv_tsecr; - u32 delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ); + u32 delta_us; + if (!delta) + delta = 1; + delta_us = delta * (USEC_PER_SEC / TCP_TS_HZ); tcp_rcv_rtt_update(tp, delta_us, 0); } } From 5e255d684d0554893768b4452de404f293c71696 Mon Sep 17 00:00:00 2001 From: Zhao Qiang Date: Mon, 18 Dec 2017 10:26:43 +0800 Subject: [PATCH 0907/1013] net: phy: marvell: Limit 88m1101 autoneg errata to 88E1145 as well. [ Upstream commit c505873eaece2b4aefd07d339dc7e1400e0235ac ] 88E1145 also need this autoneg errata. Fixes: f2899788353c ("net: phy: marvell: Limit errata to 88m1101") Signed-off-by: Zhao Qiang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/marvell.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c index 4d02b27df044..a3f456b91c99 100644 --- a/drivers/net/phy/marvell.c +++ b/drivers/net/phy/marvell.c @@ -2069,7 +2069,7 @@ static struct phy_driver marvell_drivers[] = { .flags = PHY_HAS_INTERRUPT, .probe = marvell_probe, .config_init = &m88e1145_config_init, - .config_aneg = &marvell_config_aneg, + .config_aneg = &m88e1101_config_aneg, .read_status = &genphy_read_status, .ack_interrupt = &marvell_ack_interrupt, .config_intr = &marvell_config_intr, From 333921964046fe8453a90b60aa55dc4725955073 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Wed, 20 Dec 2017 12:28:25 +0200 Subject: [PATCH 0908/1013] ipv6: Honor specified parameters in fibmatch lookup [ Upstream commit 58acfd714e6b02e8617448b431c2b64a2f1f0792 ] Currently, parameters such as oif and source address are not taken into account during fibmatch lookup. Example (IPv4 for reference) before patch: $ ip -4 route show 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1 $ ip -6 route show 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium fe80::/64 dev dummy0 proto kernel metric 256 pref medium fe80::/64 dev dummy1 proto kernel metric 256 pref medium $ ip -4 route get fibmatch 192.0.2.2 oif dummy0 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 $ ip -4 route get fibmatch 192.0.2.2 oif dummy1 RTNETLINK answers: No route to host $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium After: $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 RTNETLINK answers: Network is unreachable The problem stems from the fact that the necessary route lookup flags are not set based on these parameters. Instead of duplicating the same logic for fibmatch, we can simply resolve the original route from its copy and dump it instead. Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel Acked-by: David Ahern Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/route.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 598efa8cfe25..ca8d3266e92e 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -3700,19 +3700,13 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, if (!ipv6_addr_any(&fl6.saddr)) flags |= RT6_LOOKUP_F_HAS_SADDR; - if (!fibmatch) - dst = ip6_route_input_lookup(net, dev, &fl6, flags); - else - dst = ip6_route_lookup(net, &fl6, 0); + dst = ip6_route_input_lookup(net, dev, &fl6, flags); rcu_read_unlock(); } else { fl6.flowi6_oif = oif; - if (!fibmatch) - dst = ip6_route_output(net, NULL, &fl6); - else - dst = ip6_route_lookup(net, &fl6, 0); + dst = ip6_route_output(net, NULL, &fl6); } @@ -3729,6 +3723,15 @@ static int inet6_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh, goto errout; } + if (fibmatch && rt->dst.from) { + struct rt6_info *ort = container_of(rt->dst.from, + struct rt6_info, dst); + + dst_hold(&ort->dst); + ip6_rt_put(rt); + rt = ort; + } + skb = alloc_skb(NLMSG_GOODSIZE, GFP_KERNEL); if (!skb) { ip6_rt_put(rt); From 5504319c6993d51565b582c685c3e09c0e07fded Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 12 Dec 2017 18:22:52 -0800 Subject: [PATCH 0909/1013] tcp: refresh tcp_mstamp from timers callbacks [ Upstream commit 4688eb7cf3ae2c2721d1dacff5c1384cba47d176 ] Only the retransmit timer currently refreshes tcp_mstamp We should do the same for delayed acks and keepalives. Even if RFC 7323 does not request it, this is consistent to what linux did in the past, when TS values were based on jiffies. Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path") Signed-off-by: Eric Dumazet Cc: Soheil Hassas Yeganeh Cc: Mike Maloney Cc: Neal Cardwell Acked-by: Neal Cardwell Acked-by: Soheil Hassas Yeganeh Acked-by: Mike Maloney Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/tcp_timer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index 655dd8d7f064..e9af1879cd53 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -264,6 +264,7 @@ void tcp_delack_timer_handler(struct sock *sk) icsk->icsk_ack.pingpong = 0; icsk->icsk_ack.ato = TCP_ATO_MIN; } + tcp_mstamp_refresh(tcp_sk(sk)); tcp_send_ack(sk); __NET_INC_STATS(sock_net(sk), LINUX_MIB_DELAYEDACKS); } @@ -627,6 +628,7 @@ static void tcp_keepalive_timer (unsigned long data) goto out; } + tcp_mstamp_refresh(tp); if (sk->sk_state == TCP_FIN_WAIT2 && sock_flag(sk, SOCK_DEAD)) { if (tp->linger2 >= 0) { const int tmo = tcp_fin_time(sk) - TCP_TIMEWAIT_LEN; From cea58617977baafe4a1be8f55c8e55d21c6cfb20 Mon Sep 17 00:00:00 2001 From: Kamal Heib Date: Sun, 29 Oct 2017 04:03:37 +0200 Subject: [PATCH 0910/1013] net/mlx5: FPGA, return -EINVAL if size is zero [ Upstream commit bae115a2bb479142605726e6aa130f43f50e801a ] Currently, if a size of zero is passed to mlx5_fpga_mem_{read|write}_i2c() the "err" return value will not be initialized, which triggers gcc warnings: [..]/mlx5/core/fpga/sdk.c:87 mlx5_fpga_mem_read_i2c() error: uninitialized symbol 'err'. [..]/mlx5/core/fpga/sdk.c:115 mlx5_fpga_mem_write_i2c() error: uninitialized symbol 'err'. fix that. Fixes: a9956d35d199 ('net/mlx5: FPGA, Add SBU infrastructure') Signed-off-by: Kamal Heib Reviewed-by: Yevgeny Kliteynik Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c index 3c11d6e2160a..14962969c5ba 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/sdk.c @@ -66,6 +66,9 @@ static int mlx5_fpga_mem_read_i2c(struct mlx5_fpga_device *fdev, size_t size, u8 actual_size; int err; + if (!size) + return -EINVAL; + if (!fdev->mdev) return -ENOTCONN; @@ -95,6 +98,9 @@ static int mlx5_fpga_mem_write_i2c(struct mlx5_fpga_device *fdev, size_t size, u8 actual_size; int err; + if (!size) + return -EINVAL; + if (!fdev->mdev) return -ENOTCONN; From 215b69e208089f42c29f9343ef11f4adc6ea6f6b Mon Sep 17 00:00:00 2001 From: Alexey Kodanev Date: Thu, 14 Dec 2017 20:20:00 +0300 Subject: [PATCH 0911/1013] vxlan: restore dev->mtu setting based on lower device [ Upstream commit f870c1ff65a6d1f3a083f277280802ee09a5b44d ] Stefano Brivio says: Commit a985343ba906 ("vxlan: refactor verification and application of configuration") introduced a change in the behaviour of initial MTU setting: earlier, the MTU for a link created on top of a given lower device, without an initial MTU specification, was set to the MTU of the lower device minus headroom as a result of this path in vxlan_dev_configure(): if (!conf->mtu) dev->mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM); which is now gone. Now, the initial MTU, in absence of a configured value, is simply set by ether_setup() to ETH_DATA_LEN (1500 bytes). This breaks userspace expectations in case the MTU of the lower device is higher than 1500 bytes minus headroom. This patch restores the previous behaviour on newlink operation. Since max_mtu can be negative and we update dev->mtu directly, also check it for valid minimum. Reported-by: Junhan Yan Fixes: a985343ba906 ("vxlan: refactor verification and application of configuration") Signed-off-by: Alexey Kodanev Acked-by: Stefano Brivio Signed-off-by: Stefano Brivio Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/vxlan.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c index a2f4e52fadb5..9e9202b50e73 100644 --- a/drivers/net/vxlan.c +++ b/drivers/net/vxlan.c @@ -3105,6 +3105,11 @@ static void vxlan_config_apply(struct net_device *dev, max_mtu = lowerdev->mtu - (use_ipv6 ? VXLAN6_HEADROOM : VXLAN_HEADROOM); + if (max_mtu < ETH_MIN_MTU) + max_mtu = ETH_MIN_MTU; + + if (!changelink && !conf->mtu) + dev->mtu = max_mtu; } if (dev->mtu > max_mtu) From 11295730446fd654aeef28a575d801f3679424fa Mon Sep 17 00:00:00 2001 From: Jiri Pirko Date: Fri, 15 Dec 2017 12:40:13 +0100 Subject: [PATCH 0912/1013] net: sched: fix static key imbalance in case of ingress/clsact_init error [ Upstream commit b59e6979a86384e68b0ab6ffeab11f0034fba82d ] Move static key increments to the beginning of the init function so they pair 1:1 with decrements in ingress/clsact_destroy, which is called in case ingress/clsact_init fails. Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Signed-off-by: Jiri Pirko Acked-by: Cong Wang Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/sched/sch_ingress.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/net/sched/sch_ingress.c b/net/sched/sch_ingress.c index 44de4ee51ce9..a08a32fa0949 100644 --- a/net/sched/sch_ingress.c +++ b/net/sched/sch_ingress.c @@ -59,11 +59,12 @@ static int ingress_init(struct Qdisc *sch, struct nlattr *opt) struct net_device *dev = qdisc_dev(sch); int err; + net_inc_ingress_queue(); + err = tcf_block_get(&q->block, &dev->ingress_cl_list); if (err) return err; - net_inc_ingress_queue(); sch->flags |= TCQ_F_CPUSTATS; return 0; @@ -153,6 +154,9 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt) struct net_device *dev = qdisc_dev(sch); int err; + net_inc_ingress_queue(); + net_inc_egress_queue(); + err = tcf_block_get(&q->ingress_block, &dev->ingress_cl_list); if (err) return err; @@ -161,9 +165,6 @@ static int clsact_init(struct Qdisc *sch, struct nlattr *opt) if (err) return err; - net_inc_ingress_queue(); - net_inc_egress_queue(); - sch->flags |= TCQ_F_CPUSTATS; return 0; From 7f6dcb82d04045f143758f56caf031376f9bf16c Mon Sep 17 00:00:00 2001 From: Calvin Owens Date: Fri, 8 Dec 2017 09:05:26 -0800 Subject: [PATCH 0913/1013] bnxt_en: Fix sources of spurious netpoll warnings [ Upstream commit 2edbdb3159d6f6bd3a9b6e7f789f2b879699a519 ] After applying 2270bc5da3497945 ("bnxt_en: Fix netpoll handling") and 903649e718f80da2 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."), we still see the following WARN fire: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160 bnxt_poll+0x0/0xd0 exceeded budget in poll Call Trace: [] dump_stack+0x4d/0x70 [] __warn+0xd3/0xf0 [] warn_slowpath_fmt+0x4f/0x60 [] netpoll_poll_dev+0x15a/0x160 [] netpoll_send_skb_on_dev+0x168/0x250 [] netpoll_send_udp+0x2dc/0x440 [] write_ext_msg+0x20e/0x250 [] call_console_drivers.constprop.23+0xa5/0x110 [] console_unlock+0x339/0x5b0 [] vprintk_emit+0x2c8/0x450 [] vprintk_default+0x1f/0x30 [] printk+0x48/0x50 [] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core] [] edac_mc_handle_error+0x42b/0x6e0 [edac_core] [] sbridge_mce_output_error+0x410/0x10d0 [sb_edac] [] sbridge_check_error+0xac/0x130 [sb_edac] [] edac_mc_workq_function+0x3c/0x90 [edac_core] [] process_one_work+0x19b/0x480 [] worker_thread+0x6a/0x520 [] kthread+0xe4/0x100 [] ret_from_fork+0x22/0x40 This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually given a non-zero budget. Signed-off-by: Calvin Owens Acked-by: Michael Chan Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index dc5de275352a..aa764c5e3c6b 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -1875,7 +1875,7 @@ static int bnxt_poll_work(struct bnxt *bp, struct bnxt_napi *bnapi, int budget) * here forever if we consistently cannot allocate * buffers. */ - else if (rc == -ENOMEM) + else if (rc == -ENOMEM && budget) rx_pkts++; else if (rc == -EBUSY) /* partial completion */ break; @@ -1961,7 +1961,7 @@ static int bnxt_poll_nitroa0(struct napi_struct *napi, int budget) cpu_to_le32(RX_CMPL_ERRORS_CRC_ERROR); rc = bnxt_rx_pkt(bp, bnapi, &raw_cons, &event); - if (likely(rc == -EIO)) + if (likely(rc == -EIO) && budget) rx_pkts++; else if (rc == -EBUSY) /* partial completion */ break; From 39889c293371e826503b61d98050a49742ccaf9c Mon Sep 17 00:00:00 2001 From: Russell King Date: Wed, 20 Dec 2017 23:21:28 +0000 Subject: [PATCH 0914/1013] phylink: ensure the PHY interface mode is appropriately set [ Upstream commit 182088aa3c6c7f7c20a2c1dcc9ded4a3fc631f38 ] When setting the ethtool settings, ensure that the validated PHY interface mode is propagated to the current link settings, so that 2500BaseX can be selected. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Russell King Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/phylink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index bcb4755bcd95..38d81175bfd3 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -948,6 +948,7 @@ int phylink_ethtool_ksettings_set(struct phylink *pl, mutex_lock(&pl->state_mutex); /* Configure the MAC to match the new settings */ linkmode_copy(pl->link_config.advertising, our_kset.link_modes.advertising); + pl->link_config.interface = config.interface; pl->link_config.speed = our_kset.base.speed; pl->link_config.duplex = our_kset.base.duplex; pl->link_config.an_enabled = our_kset.base.autoneg != AUTONEG_DISABLE; From 185a3475dee6c14dd67d1a94178b6b5615cf167b Mon Sep 17 00:00:00 2001 From: Russell King Date: Wed, 20 Dec 2017 23:21:34 +0000 Subject: [PATCH 0915/1013] phylink: ensure AN is enabled [ Upstream commit 74ee0e8c1bf9925c59cc8f1c65c29adf6e4cf603 ] Ensure that we mark AN as enabled at boot time, rather than leaving it disabled. This is noticable if your SFP module is fiber, and it supports faster speeds than 1G with 2.5G support in place. Fixes: 9525ae83959b ("phylink: add phylink infrastructure") Signed-off-by: Russell King Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- drivers/net/phy/phylink.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c index 38d81175bfd3..4b377b978a0b 100644 --- a/drivers/net/phy/phylink.c +++ b/drivers/net/phy/phylink.c @@ -525,6 +525,7 @@ struct phylink *phylink_create(struct net_device *ndev, struct device_node *np, pl->link_config.pause = MLO_PAUSE_AN; pl->link_config.speed = SPEED_UNKNOWN; pl->link_config.duplex = DUPLEX_UNKNOWN; + pl->link_config.an_enabled = true; pl->ops = ops; __set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state); From 86deaaa0ca2b0657e1f72e1087153d4ff22a312b Mon Sep 17 00:00:00 2001 From: Phil Sutter Date: Tue, 19 Dec 2017 15:17:13 +0100 Subject: [PATCH 0916/1013] ipv4: fib: Fix metrics match when deleting a route [ Upstream commit d03a45572efa068fa64db211d6d45222660e76c5 ] The recently added fib_metrics_match() causes a regression for routes with both RTAX_FEATURES and RTAX_CC_ALGO if the latter has TCP_CONG_NEEDS_ECN flag set: | # ip link add d0 type dummy | # ip link set d0 up | # ip route add 172.29.29.0/24 dev d0 features ecn congctl dctcp | # ip route del 172.29.29.0/24 dev d0 features ecn congctl dctcp | RTNETLINK answers: No such process During route insertion, fib_convert_metrics() detects that the given CC algo requires ECN and hence sets DST_FEATURE_ECN_CA bit in RTAX_FEATURES. During route deletion though, fib_metrics_match() compares stored RTAX_FEATURES value with that from userspace (which obviously has no knowledge about DST_FEATURE_ECN_CA) and fails. Fixes: 5f9ae3d9e7e4a ("ipv4: do metrics match when looking up and deleting a route") Signed-off-by: Phil Sutter Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv4/fib_semantics.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 01ed22139ac2..aff3751df950 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -706,7 +706,7 @@ bool fib_metrics_match(struct fib_config *cfg, struct fib_info *fi) nla_for_each_attr(nla, cfg->fc_mx, cfg->fc_mx_len, remaining) { int type = nla_type(nla); - u32 val; + u32 fi_val, val; if (!type) continue; @@ -723,7 +723,11 @@ bool fib_metrics_match(struct fib_config *cfg, struct fib_info *fi) val = nla_get_u32(nla); } - if (fi->fib_metrics->metrics[type - 1] != val) + fi_val = fi->fib_metrics->metrics[type - 1]; + if (type == RTAX_FEATURES) + fi_val &= ~DST_FEATURE_ECN_CA; + + if (fi_val != val) return false; } From f2272d5dce7905d605f26ba8a5f9961d05263e74 Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Tue, 14 Nov 2017 14:21:32 +0100 Subject: [PATCH 0917/1013] ipv6: set all.accept_dad to 0 by default [ Upstream commit 094009531612246d9e13f9e0c3ae2205d7f63a0a ] With commits 35e015e1f577 and a2d3f3e33853, the global 'accept_dad' flag is also taken into account (default value is 1). If either global or per-interface flag is non-zero, DAD will be enabled on a given interface. This is not backward compatible: before those patches, the user could disable DAD just by setting the per-interface flag to 0. Now, the user instead needs to set both flags to 0 to actually disable DAD. Restore the previous behaviour by setting the default for the global 'accept_dad' flag to 0. This way, DAD is still enabled by default, as per-interface flags are set to 1 on device creation, but setting them to 0 is enough to disable DAD on a given interface. - Before 35e015e1f57a7 and a2d3f3e33853: global per-interface DAD enabled [default] 1 1 yes X 0 no X 1 yes - After 35e015e1f577 and a2d3f3e33853: global per-interface DAD enabled [default] 1 1 yes 0 0 no 0 1 yes 1 0 yes - After this fix: global per-interface DAD enabled 1 1 yes 0 0 no [default] 0 1 yes 1 0 yes Fixes: 35e015e1f577 ("ipv6: fix net.ipv6.conf.all interface DAD handlers") Fixes: a2d3f3e33853 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real") CC: Stefano Brivio CC: Matteo Croce CC: Erik Kline Signed-off-by: Nicolas Dichtel Acked-by: Stefano Brivio Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/ipv6/addrconf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 2ec39404c449..c5318f5f6a14 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -231,7 +231,7 @@ static struct ipv6_devconf ipv6_devconf __read_mostly = { .proxy_ndp = 0, .accept_source_route = 0, /* we do not accept RH0 by default. */ .disable_ipv6 = 0, - .accept_dad = 1, + .accept_dad = 0, .suppress_frag_ndisc = 1, .accept_ra_mtu = 1, .stable_secret = { From 8f5bbb29b62c818e16de49c0bbc945b501c157cf Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Fri, 10 Nov 2017 15:59:52 +0900 Subject: [PATCH 0918/1013] Revert "mlx5: move affinity hints assignments to generic code" [ Upstream commit 231243c82793428467524227ae02ca451e6a98e7 ] Before the offending commit, mlx5 core did the IRQ affinity itself, and it seems that the new generic code have some drawbacks and one of them is the lack for user ability to modify irq affinity after the initial affinity values got assigned. The issue is still being discussed and a solution in the new generic code is required, until then we need to revert this patch. This fixes the following issue: echo > /proc/irq//smp_affinity fails with -EIO This reverts commit a435393acafbf0ecff4deb3e3cb554b34f0d0664. Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since it is used in mlx5_ib driver. Fixes: a435393acafb ("mlx5: move affinity hints assignments to generic code") Cc: Sagi Grimberg Cc: Thomas Gleixner Cc: Jes Sorensen Reported-by: Jes Sorensen Signed-off-by: Saeed Mahameed Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 + .../net/ethernet/mellanox/mlx5/core/en_main.c | 45 ++++++----- .../net/ethernet/mellanox/mlx5/core/main.c | 75 +++++++++++++++++-- include/linux/mlx5/driver.h | 1 + 4 files changed, 93 insertions(+), 29 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h index 13b5ef9d8703..5fa071620104 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h @@ -590,6 +590,7 @@ struct mlx5e_channel { struct mlx5_core_dev *mdev; struct mlx5e_tstamp *tstamp; int ix; + int cpu; }; struct mlx5e_channels { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 517727f277c5..3cdb932cae76 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -71,11 +71,6 @@ struct mlx5e_channel_param { struct mlx5e_cq_param icosq_cq; }; -static int mlx5e_get_node(struct mlx5e_priv *priv, int ix) -{ - return pci_irq_get_node(priv->mdev->pdev, MLX5_EQ_VEC_COMP_BASE + ix); -} - static bool mlx5e_check_fragmented_striding_rq_cap(struct mlx5_core_dev *mdev) { return MLX5_CAP_GEN(mdev, striding_rq) && @@ -452,17 +447,16 @@ static int mlx5e_rq_alloc_mpwqe_info(struct mlx5e_rq *rq, int wq_sz = mlx5_wq_ll_get_size(&rq->wq); int mtt_sz = mlx5e_get_wqe_mtt_sz(); int mtt_alloc = mtt_sz + MLX5_UMR_ALIGN - 1; - int node = mlx5e_get_node(c->priv, c->ix); int i; rq->mpwqe.info = kzalloc_node(wq_sz * sizeof(*rq->mpwqe.info), - GFP_KERNEL, node); + GFP_KERNEL, cpu_to_node(c->cpu)); if (!rq->mpwqe.info) goto err_out; /* We allocate more than mtt_sz as we will align the pointer */ - rq->mpwqe.mtt_no_align = kzalloc_node(mtt_alloc * wq_sz, - GFP_KERNEL, node); + rq->mpwqe.mtt_no_align = kzalloc_node(mtt_alloc * wq_sz, GFP_KERNEL, + cpu_to_node(c->cpu)); if (unlikely(!rq->mpwqe.mtt_no_align)) goto err_free_wqe_info; @@ -570,7 +564,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, int err; int i; - rqp->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix); + rqp->wq.db_numa_node = cpu_to_node(c->cpu); err = mlx5_wq_ll_create(mdev, &rqp->wq, rqc_wq, &rq->wq, &rq->wq_ctrl); @@ -636,8 +630,7 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c, default: /* MLX5_WQ_TYPE_LINKED_LIST */ rq->wqe.frag_info = kzalloc_node(wq_sz * sizeof(*rq->wqe.frag_info), - GFP_KERNEL, - mlx5e_get_node(c->priv, c->ix)); + GFP_KERNEL, cpu_to_node(c->cpu)); if (!rq->wqe.frag_info) { err = -ENOMEM; goto err_rq_wq_destroy; @@ -1007,13 +1000,13 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, sq->uar_map = mdev->mlx5e_res.bfreg.map; sq->min_inline_mode = params->tx_min_inline_mode; - param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix); + param->wq.db_numa_node = cpu_to_node(c->cpu); err = mlx5_wq_cyc_create(mdev, ¶m->wq, sqc_wq, &sq->wq, &sq->wq_ctrl); if (err) return err; sq->wq.db = &sq->wq.db[MLX5_SND_DBR]; - err = mlx5e_alloc_xdpsq_db(sq, mlx5e_get_node(c->priv, c->ix)); + err = mlx5e_alloc_xdpsq_db(sq, cpu_to_node(c->cpu)); if (err) goto err_sq_wq_destroy; @@ -1060,13 +1053,13 @@ static int mlx5e_alloc_icosq(struct mlx5e_channel *c, sq->channel = c; sq->uar_map = mdev->mlx5e_res.bfreg.map; - param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix); + param->wq.db_numa_node = cpu_to_node(c->cpu); err = mlx5_wq_cyc_create(mdev, ¶m->wq, sqc_wq, &sq->wq, &sq->wq_ctrl); if (err) return err; sq->wq.db = &sq->wq.db[MLX5_SND_DBR]; - err = mlx5e_alloc_icosq_db(sq, mlx5e_get_node(c->priv, c->ix)); + err = mlx5e_alloc_icosq_db(sq, cpu_to_node(c->cpu)); if (err) goto err_sq_wq_destroy; @@ -1132,13 +1125,13 @@ static int mlx5e_alloc_txqsq(struct mlx5e_channel *c, if (MLX5_IPSEC_DEV(c->priv->mdev)) set_bit(MLX5E_SQ_STATE_IPSEC, &sq->state); - param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix); + param->wq.db_numa_node = cpu_to_node(c->cpu); err = mlx5_wq_cyc_create(mdev, ¶m->wq, sqc_wq, &sq->wq, &sq->wq_ctrl); if (err) return err; sq->wq.db = &sq->wq.db[MLX5_SND_DBR]; - err = mlx5e_alloc_txqsq_db(sq, mlx5e_get_node(c->priv, c->ix)); + err = mlx5e_alloc_txqsq_db(sq, cpu_to_node(c->cpu)); if (err) goto err_sq_wq_destroy; @@ -1510,8 +1503,8 @@ static int mlx5e_alloc_cq(struct mlx5e_channel *c, struct mlx5_core_dev *mdev = c->priv->mdev; int err; - param->wq.buf_numa_node = mlx5e_get_node(c->priv, c->ix); - param->wq.db_numa_node = mlx5e_get_node(c->priv, c->ix); + param->wq.buf_numa_node = cpu_to_node(c->cpu); + param->wq.db_numa_node = cpu_to_node(c->cpu); param->eq_ix = c->ix; err = mlx5e_alloc_cq_common(mdev, param, cq); @@ -1610,6 +1603,11 @@ static void mlx5e_close_cq(struct mlx5e_cq *cq) mlx5e_free_cq(cq); } +static int mlx5e_get_cpu(struct mlx5e_priv *priv, int ix) +{ + return cpumask_first(priv->mdev->priv.irq_info[ix].mask); +} + static int mlx5e_open_tx_cqs(struct mlx5e_channel *c, struct mlx5e_params *params, struct mlx5e_channel_param *cparam) @@ -1758,12 +1756,13 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, { struct mlx5e_cq_moder icocq_moder = {0, 0}; struct net_device *netdev = priv->netdev; + int cpu = mlx5e_get_cpu(priv, ix); struct mlx5e_channel *c; unsigned int irq; int err; int eqn; - c = kzalloc_node(sizeof(*c), GFP_KERNEL, mlx5e_get_node(priv, ix)); + c = kzalloc_node(sizeof(*c), GFP_KERNEL, cpu_to_node(cpu)); if (!c) return -ENOMEM; @@ -1771,6 +1770,7 @@ static int mlx5e_open_channel(struct mlx5e_priv *priv, int ix, c->mdev = priv->mdev; c->tstamp = &priv->tstamp; c->ix = ix; + c->cpu = cpu; c->pdev = &priv->mdev->pdev->dev; c->netdev = priv->netdev; c->mkey_be = cpu_to_be32(priv->mdev->mlx5e_res.mkey.key); @@ -1859,8 +1859,7 @@ static void mlx5e_activate_channel(struct mlx5e_channel *c) for (tc = 0; tc < c->num_tc; tc++) mlx5e_activate_txqsq(&c->sq[tc]); mlx5e_activate_rq(&c->rq); - netif_set_xps_queue(c->netdev, - mlx5_get_vector_affinity(c->priv->mdev, c->ix), c->ix); + netif_set_xps_queue(c->netdev, get_cpu_mask(c->cpu), c->ix); } static void mlx5e_deactivate_channel(struct mlx5e_channel *c) diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 06562c9a6b9c..8bfc37e4ec87 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -316,9 +316,6 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev) { struct mlx5_priv *priv = &dev->priv; struct mlx5_eq_table *table = &priv->eq_table; - struct irq_affinity irqdesc = { - .pre_vectors = MLX5_EQ_VEC_COMP_BASE, - }; int num_eqs = 1 << MLX5_CAP_GEN(dev, log_max_eq); int nvec; @@ -332,10 +329,9 @@ static int mlx5_alloc_irq_vectors(struct mlx5_core_dev *dev) if (!priv->irq_info) goto err_free_msix; - nvec = pci_alloc_irq_vectors_affinity(dev->pdev, + nvec = pci_alloc_irq_vectors(dev->pdev, MLX5_EQ_VEC_COMP_BASE + 1, nvec, - PCI_IRQ_MSIX | PCI_IRQ_AFFINITY, - &irqdesc); + PCI_IRQ_MSIX); if (nvec < 0) return nvec; @@ -621,6 +617,63 @@ u64 mlx5_read_internal_timer(struct mlx5_core_dev *dev) return (u64)timer_l | (u64)timer_h1 << 32; } +static int mlx5_irq_set_affinity_hint(struct mlx5_core_dev *mdev, int i) +{ + struct mlx5_priv *priv = &mdev->priv; + int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i); + + if (!zalloc_cpumask_var(&priv->irq_info[i].mask, GFP_KERNEL)) { + mlx5_core_warn(mdev, "zalloc_cpumask_var failed"); + return -ENOMEM; + } + + cpumask_set_cpu(cpumask_local_spread(i, priv->numa_node), + priv->irq_info[i].mask); + + if (IS_ENABLED(CONFIG_SMP) && + irq_set_affinity_hint(irq, priv->irq_info[i].mask)) + mlx5_core_warn(mdev, "irq_set_affinity_hint failed, irq 0x%.4x", irq); + + return 0; +} + +static void mlx5_irq_clear_affinity_hint(struct mlx5_core_dev *mdev, int i) +{ + struct mlx5_priv *priv = &mdev->priv; + int irq = pci_irq_vector(mdev->pdev, MLX5_EQ_VEC_COMP_BASE + i); + + irq_set_affinity_hint(irq, NULL); + free_cpumask_var(priv->irq_info[i].mask); +} + +static int mlx5_irq_set_affinity_hints(struct mlx5_core_dev *mdev) +{ + int err; + int i; + + for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++) { + err = mlx5_irq_set_affinity_hint(mdev, i); + if (err) + goto err_out; + } + + return 0; + +err_out: + for (i--; i >= 0; i--) + mlx5_irq_clear_affinity_hint(mdev, i); + + return err; +} + +static void mlx5_irq_clear_affinity_hints(struct mlx5_core_dev *mdev) +{ + int i; + + for (i = 0; i < mdev->priv.eq_table.num_comp_vectors; i++) + mlx5_irq_clear_affinity_hint(mdev, i); +} + int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn, unsigned int *irqn) { @@ -1093,6 +1146,12 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv, goto err_stop_eqs; } + err = mlx5_irq_set_affinity_hints(dev); + if (err) { + dev_err(&pdev->dev, "Failed to alloc affinity hint cpumask\n"); + goto err_affinity_hints; + } + err = mlx5_init_fs(dev); if (err) { dev_err(&pdev->dev, "Failed to init flow steering\n"); @@ -1150,6 +1209,9 @@ static int mlx5_load_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv, mlx5_cleanup_fs(dev); err_fs: + mlx5_irq_clear_affinity_hints(dev); + +err_affinity_hints: free_comp_eqs(dev); err_stop_eqs: @@ -1218,6 +1280,7 @@ static int mlx5_unload_one(struct mlx5_core_dev *dev, struct mlx5_priv *priv, mlx5_sriov_detach(dev); mlx5_cleanup_fs(dev); + mlx5_irq_clear_affinity_hints(dev); free_comp_eqs(dev); mlx5_stop_eqs(dev); mlx5_put_uars_page(dev, priv->uar); diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 401c8972cc3a..8b3d0103c03a 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -546,6 +546,7 @@ struct mlx5_core_sriov { }; struct mlx5_irq_info { + cpumask_var_t mask; char name[MLX5_MAX_IRQ_NAME]; }; From 17155ea827b2fd81330a442ed56d0edafd9969e1 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Wed, 20 Dec 2017 17:37:49 -0500 Subject: [PATCH 0919/1013] skbuff: orphan frags before zerocopy clone [ Upstream commit 268b790679422a89e9ab0685d9f291edae780c98 ] Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate calls to skb_uarg(skb)->callback for the same data. skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb with each segment. This is only safe for uargs that do refcounting, which is those that pass skb_orphan_frags without dropping their shared frags. For others, skb_orphan_frags drops the user frags and sets the uarg to NULL, after which sock_zerocopy_clone has no effect. Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback calls for the same data causing the vhost_net_ubuf_ref_>refcount to drop below zero. Link: http://lkml.kernel.org/r/ Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Reported-by: Andreas Hartmann Reported-by: David Hill Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 623143d7844d..3c2e1db62eed 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -3657,8 +3657,6 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, skb_shinfo(nskb)->tx_flags |= skb_shinfo(head_skb)->tx_flags & SKBTX_SHARED_FRAG; - if (skb_zerocopy_clone(nskb, head_skb, GFP_ATOMIC)) - goto err; while (pos < offset + len) { if (i >= nfrags) { @@ -3684,6 +3682,8 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, if (unlikely(skb_orphan_frags(frag_skb, GFP_ATOMIC))) goto err; + if (skb_zerocopy_clone(nskb, frag_skb, GFP_ATOMIC)) + goto err; *nskb_frag = *frag; __skb_frag_ref(nskb_frag); From 49cd180d4a104159f240a9a896e5f13844227378 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Wed, 20 Dec 2017 17:37:50 -0500 Subject: [PATCH 0920/1013] skbuff: skb_copy_ubufs must release uarg even without user frags [ Upstream commit b90ddd568792bcb0054eaf0f61785c8f80c3bd1c ] skb_copy_ubufs creates a private copy of frags[] to release its hold on user frags, then calls uarg->callback to notify the owner. Call uarg->callback even when no frags exist. This edge case can happen when zerocopy_sg_from_iter finds enough room in skb_headlen to copy all the data. Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages") Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3c2e1db62eed..4a10e96c6efc 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1182,7 +1182,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) u32 d_off; if (!num_frags) - return 0; + goto release; if (skb_shared(skb) || skb_unclone(skb, gfp_mask)) return -EINVAL; @@ -1242,6 +1242,7 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) __skb_fill_page_desc(skb, new_frags - 1, head, 0, d_off); skb_shinfo(skb)->nr_frags = new_frags; +release: skb_zcopy_clear(skb, false); return 0; } From b9edd6bf0ccb2939a7716b80d3e73a24ead9fe62 Mon Sep 17 00:00:00 2001 From: Willem de Bruijn Date: Thu, 28 Dec 2017 12:38:13 -0500 Subject: [PATCH 0921/1013] skbuff: in skb_copy_ubufs unclone before releasing zerocopy skb_copy_ubufs must unclone before it is safe to modify its skb_shared_info with skb_zcopy_clear. Commit b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags") ensures that all skbs release their zerocopy state, even those without frags. But I forgot an edge case where such an skb arrives that is cloned. The stack does not build such packets. Vhost/tun skbs have their frags orphaned before cloning. TCP skbs only attach zerocopy state when a frag is added. But if TCP packets can be trimmed or linearized, this might occur. Tracing the code I found no instance so far (e.g., skb_linearize ends up calling skb_zcopy_clear if !skb->data_len). Still, it is non-obvious that no path exists. And it is fragile to rely on this. Fixes: b90ddd568792 ("skbuff: skb_copy_ubufs must release uarg even without user frags") Signed-off-by: Willem de Bruijn Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- net/core/skbuff.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 4a10e96c6efc..15fa5baa8fae 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -1181,12 +1181,12 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask) int i, new_frags; u32 d_off; - if (!num_frags) - goto release; - if (skb_shared(skb) || skb_unclone(skb, gfp_mask)) return -EINVAL; + if (!num_frags) + goto release; + new_frags = (__skb_pagelen(skb) + PAGE_SIZE - 1) >> PAGE_SHIFT; for (i = 0; i < new_frags; i++) { page = alloc_page(gfp_mask); From 38e8981d5490b2908136baea1f9e3b84a87aec01 Mon Sep 17 00:00:00 2001 From: Jan Engelhardt Date: Mon, 25 Dec 2017 03:43:53 +0100 Subject: [PATCH 0922/1013] sparc64: repair calling incorrect hweight function from stubs [ Upstream commit 59585b4be9ae4dc6506551709bdcd6f5210b8a01 ] Commit v4.12-rc4-1-g9289ea7f952b introduced a mistake that made the 64-bit hweight stub call the 16-bit hweight function. Fixes: 9289ea7f952b ("sparc64: Use indirect calls in hamming weight stubs") Signed-off-by: Jan Engelhardt Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman --- arch/sparc/lib/hweight.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/sparc/lib/hweight.S b/arch/sparc/lib/hweight.S index e5547b22cd18..0ddbbb031822 100644 --- a/arch/sparc/lib/hweight.S +++ b/arch/sparc/lib/hweight.S @@ -44,8 +44,8 @@ EXPORT_SYMBOL(__arch_hweight32) .previous ENTRY(__arch_hweight64) - sethi %hi(__sw_hweight16), %g1 - jmpl %g1 + %lo(__sw_hweight16), %g0 + sethi %hi(__sw_hweight64), %g1 + jmpl %g1 + %lo(__sw_hweight64), %g0 nop ENDPROC(__arch_hweight64) EXPORT_SYMBOL(__arch_hweight64) From 2ee11dcfc9e1c894b058de64fb8b693cf697fbbe Mon Sep 17 00:00:00 2001 From: Juan Zea Date: Fri, 15 Dec 2017 10:21:20 +0100 Subject: [PATCH 0923/1013] usbip: fix usbip bind writing random string after command in match_busid commit 544c4605acc5ae4afe7dd5914147947db182f2fb upstream. usbip bind writes commands followed by random string when writing to match_busid attribute in sysfs, caused by using full variable size instead of string length. Signed-off-by: Juan Zea Acked-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- tools/usb/usbip/src/utils.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/tools/usb/usbip/src/utils.c b/tools/usb/usbip/src/utils.c index 2b3d6d235015..3d7b42e77299 100644 --- a/tools/usb/usbip/src/utils.c +++ b/tools/usb/usbip/src/utils.c @@ -30,6 +30,7 @@ int modify_match_busid(char *busid, int add) char command[SYSFS_BUS_ID_SIZE + 4]; char match_busid_attr_path[SYSFS_PATH_MAX]; int rc; + int cmd_size; snprintf(match_busid_attr_path, sizeof(match_busid_attr_path), "%s/%s/%s/%s/%s/%s", SYSFS_MNT_PATH, SYSFS_BUS_NAME, @@ -37,12 +38,14 @@ int modify_match_busid(char *busid, int add) attr_name); if (add) - snprintf(command, SYSFS_BUS_ID_SIZE + 4, "add %s", busid); + cmd_size = snprintf(command, SYSFS_BUS_ID_SIZE + 4, "add %s", + busid); else - snprintf(command, SYSFS_BUS_ID_SIZE + 4, "del %s", busid); + cmd_size = snprintf(command, SYSFS_BUS_ID_SIZE + 4, "del %s", + busid); rc = write_sysfs_attribute(match_busid_attr_path, command, - sizeof(command)); + cmd_size); if (rc < 0) { dbg("failed to write match_busid: %s", strerror(errno)); return -1; From 6b8335e48ae7171206eda3ae572f0cd0c75ca8a1 Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Fri, 15 Dec 2017 10:50:09 -0700 Subject: [PATCH 0924/1013] usbip: prevent leaking socket pointer address in messages commit 90120d15f4c397272aaf41077960a157fc4212bf upstream. usbip driver is leaking socket pointer address in messages. Remove the messages that aren't useful and print sockfd in the ones that are useful for debugging. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- drivers/usb/usbip/stub_dev.c | 3 +-- drivers/usb/usbip/usbip_common.c | 16 +++++----------- drivers/usb/usbip/vhci_hcd.c | 2 +- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/drivers/usb/usbip/stub_dev.c b/drivers/usb/usbip/stub_dev.c index c653ce533430..720408d39f11 100644 --- a/drivers/usb/usbip/stub_dev.c +++ b/drivers/usb/usbip/stub_dev.c @@ -163,8 +163,7 @@ static void stub_shutdown_connection(struct usbip_device *ud) * step 1? */ if (ud->tcp_socket) { - dev_dbg(&sdev->udev->dev, "shutdown tcp_socket %p\n", - ud->tcp_socket); + dev_dbg(&sdev->udev->dev, "shutdown sockfd %d\n", ud->sockfd); kernel_sock_shutdown(ud->tcp_socket, SHUT_RDWR); } diff --git a/drivers/usb/usbip/usbip_common.c b/drivers/usb/usbip/usbip_common.c index 2281f3562870..17b599b923f3 100644 --- a/drivers/usb/usbip/usbip_common.c +++ b/drivers/usb/usbip/usbip_common.c @@ -331,26 +331,20 @@ int usbip_recv(struct socket *sock, void *buf, int size) struct msghdr msg = {.msg_flags = MSG_NOSIGNAL}; int total = 0; + if (!sock || !buf || !size) + return -EINVAL; + iov_iter_kvec(&msg.msg_iter, READ|ITER_KVEC, &iov, 1, size); usbip_dbg_xmit("enter\n"); - if (!sock || !buf || !size) { - pr_err("invalid arg, sock %p buff %p size %d\n", sock, buf, - size); - return -EINVAL; - } - do { - int sz = msg_data_left(&msg); + msg_data_left(&msg); sock->sk->sk_allocation = GFP_NOIO; result = sock_recvmsg(sock, &msg, MSG_WAITALL); - if (result <= 0) { - pr_debug("receive sock %p buf %p size %u ret %d total %d\n", - sock, buf + total, sz, result, total); + if (result <= 0) goto err; - } total += result; } while (msg_data_left(&msg)); diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c index 1f0cf81cc145..4eec1d796a2d 100644 --- a/drivers/usb/usbip/vhci_hcd.c +++ b/drivers/usb/usbip/vhci_hcd.c @@ -989,7 +989,7 @@ static void vhci_shutdown_connection(struct usbip_device *ud) /* need this? see stub_dev.c */ if (ud->tcp_socket) { - pr_debug("shutdown tcp_socket %p\n", ud->tcp_socket); + pr_debug("shutdown tcp_socket %d\n", ud->sockfd); kernel_sock_shutdown(ud->tcp_socket, SHUT_RDWR); } From ed4db9a7f8cb9e83605e63e7656a6efb56846ebb Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Mon, 18 Dec 2017 17:23:37 -0700 Subject: [PATCH 0925/1013] usbip: stub: stop printing kernel pointer addresses in messages commit 248a22044366f588d46754c54dfe29ffe4f8b4df upstream. Remove and/or change debug, info. and error messages to not print kernel pointer addresses. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- drivers/usb/usbip/stub_main.c | 5 +++-- drivers/usb/usbip/stub_rx.c | 7 ++----- drivers/usb/usbip/stub_tx.c | 6 +++--- 3 files changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/usb/usbip/stub_main.c b/drivers/usb/usbip/stub_main.c index 7170404e8979..6968c906fa29 100644 --- a/drivers/usb/usbip/stub_main.c +++ b/drivers/usb/usbip/stub_main.c @@ -251,11 +251,12 @@ void stub_device_cleanup_urbs(struct stub_device *sdev) struct stub_priv *priv; struct urb *urb; - dev_dbg(&sdev->udev->dev, "free sdev %p\n", sdev); + dev_dbg(&sdev->udev->dev, "Stub device cleaning up urbs\n"); while ((priv = stub_priv_pop(sdev))) { urb = priv->urb; - dev_dbg(&sdev->udev->dev, "free urb %p\n", urb); + dev_dbg(&sdev->udev->dev, "free urb seqnum %lu\n", + priv->seqnum); usb_kill_urb(urb); kmem_cache_free(stub_priv_cache, priv); diff --git a/drivers/usb/usbip/stub_rx.c b/drivers/usb/usbip/stub_rx.c index 283a9be77a22..5b807185f79e 100644 --- a/drivers/usb/usbip/stub_rx.c +++ b/drivers/usb/usbip/stub_rx.c @@ -225,9 +225,6 @@ static int stub_recv_cmd_unlink(struct stub_device *sdev, if (priv->seqnum != pdu->u.cmd_unlink.seqnum) continue; - dev_info(&priv->urb->dev->dev, "unlink urb %p\n", - priv->urb); - /* * This matched urb is not completed yet (i.e., be in * flight in usb hcd hardware/driver). Now we are @@ -266,8 +263,8 @@ static int stub_recv_cmd_unlink(struct stub_device *sdev, ret = usb_unlink_urb(priv->urb); if (ret != -EINPROGRESS) dev_err(&priv->urb->dev->dev, - "failed to unlink a urb %p, ret %d\n", - priv->urb, ret); + "failed to unlink a urb # %lu, ret %d\n", + priv->seqnum, ret); return 0; } diff --git a/drivers/usb/usbip/stub_tx.c b/drivers/usb/usbip/stub_tx.c index 87ff94be4235..96aa375b80d9 100644 --- a/drivers/usb/usbip/stub_tx.c +++ b/drivers/usb/usbip/stub_tx.c @@ -102,7 +102,7 @@ void stub_complete(struct urb *urb) /* link a urb to the queue of tx. */ spin_lock_irqsave(&sdev->priv_lock, flags); if (sdev->ud.tcp_socket == NULL) { - usbip_dbg_stub_tx("ignore urb for closed connection %p", urb); + usbip_dbg_stub_tx("ignore urb for closed connection\n"); /* It will be freed in stub_device_cleanup_urbs(). */ } else if (priv->unlinking) { stub_enqueue_ret_unlink(sdev, priv->seqnum, urb->status); @@ -204,8 +204,8 @@ static int stub_send_ret_submit(struct stub_device *sdev) /* 1. setup usbip_header */ setup_ret_submit_pdu(&pdu_header, urb); - usbip_dbg_stub_tx("setup txdata seqnum: %d urb: %p\n", - pdu_header.base.seqnum, urb); + usbip_dbg_stub_tx("setup txdata seqnum: %d\n", + pdu_header.base.seqnum); usbip_header_correct_endian(&pdu_header, 1); iov[iovnum].iov_base = &pdu_header; From 2d6483bf78f859b827ea97a16b68cf64c3b56e67 Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Mon, 18 Dec 2017 17:24:22 -0700 Subject: [PATCH 0926/1013] usbip: vhci: stop printing kernel pointer addresses in messages commit 8272d099d05f7ab2776cf56a2ab9f9443be18907 upstream. Remove and/or change debug, info. and error messages to not print kernel pointer addresses. Signed-off-by: Shuah Khan Signed-off-by: Greg Kroah-Hartman --- drivers/usb/usbip/vhci_hcd.c | 10 ---------- drivers/usb/usbip/vhci_rx.c | 23 +++++++++++------------ drivers/usb/usbip/vhci_tx.c | 3 ++- 3 files changed, 13 insertions(+), 23 deletions(-) diff --git a/drivers/usb/usbip/vhci_hcd.c b/drivers/usb/usbip/vhci_hcd.c index 4eec1d796a2d..692cfdef667e 100644 --- a/drivers/usb/usbip/vhci_hcd.c +++ b/drivers/usb/usbip/vhci_hcd.c @@ -670,9 +670,6 @@ static int vhci_urb_enqueue(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flag struct vhci_device *vdev; unsigned long flags; - usbip_dbg_vhci_hc("enter, usb_hcd %p urb %p mem_flags %d\n", - hcd, urb, mem_flags); - if (portnum > VHCI_HC_PORTS) { pr_err("invalid port number %d\n", portnum); return -ENODEV; @@ -836,8 +833,6 @@ static int vhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) struct vhci_device *vdev; unsigned long flags; - pr_info("dequeue a urb %p\n", urb); - spin_lock_irqsave(&vhci->lock, flags); priv = urb->hcpriv; @@ -865,7 +860,6 @@ static int vhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) /* tcp connection is closed */ spin_lock(&vdev->priv_lock); - pr_info("device %p seems to be disconnected\n", vdev); list_del(&priv->list); kfree(priv); urb->hcpriv = NULL; @@ -877,8 +871,6 @@ static int vhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) * vhci_rx will receive RET_UNLINK and give back the URB. * Otherwise, we give back it here. */ - pr_info("gives back urb %p\n", urb); - usb_hcd_unlink_urb_from_ep(hcd, urb); spin_unlock_irqrestore(&vhci->lock, flags); @@ -906,8 +898,6 @@ static int vhci_urb_dequeue(struct usb_hcd *hcd, struct urb *urb, int status) unlink->unlink_seqnum = priv->seqnum; - pr_info("device %p seems to be still connected\n", vdev); - /* send cmd_unlink and try to cancel the pending URB in the * peer */ list_add_tail(&unlink->list, &vdev->unlink_tx); diff --git a/drivers/usb/usbip/vhci_rx.c b/drivers/usb/usbip/vhci_rx.c index ef2f2d5ca6b2..1343037d00f9 100644 --- a/drivers/usb/usbip/vhci_rx.c +++ b/drivers/usb/usbip/vhci_rx.c @@ -37,24 +37,23 @@ struct urb *pickup_urb_and_free_priv(struct vhci_device *vdev, __u32 seqnum) urb = priv->urb; status = urb->status; - usbip_dbg_vhci_rx("find urb %p vurb %p seqnum %u\n", - urb, priv, seqnum); + usbip_dbg_vhci_rx("find urb seqnum %u\n", seqnum); switch (status) { case -ENOENT: /* fall through */ case -ECONNRESET: - dev_info(&urb->dev->dev, - "urb %p was unlinked %ssynchronuously.\n", urb, - status == -ENOENT ? "" : "a"); + dev_dbg(&urb->dev->dev, + "urb seq# %u was unlinked %ssynchronuously\n", + seqnum, status == -ENOENT ? "" : "a"); break; case -EINPROGRESS: /* no info output */ break; default: - dev_info(&urb->dev->dev, - "urb %p may be in a error, status %d\n", urb, - status); + dev_dbg(&urb->dev->dev, + "urb seq# %u may be in a error, status %d\n", + seqnum, status); } list_del(&priv->list); @@ -81,8 +80,8 @@ static void vhci_recv_ret_submit(struct vhci_device *vdev, spin_unlock_irqrestore(&vdev->priv_lock, flags); if (!urb) { - pr_err("cannot find a urb of seqnum %u\n", pdu->base.seqnum); - pr_info("max seqnum %d\n", + pr_err("cannot find a urb of seqnum %u max seqnum %d\n", + pdu->base.seqnum, atomic_read(&vhci_hcd->seqnum)); usbip_event_add(ud, VDEV_EVENT_ERROR_TCP); return; @@ -105,7 +104,7 @@ static void vhci_recv_ret_submit(struct vhci_device *vdev, if (usbip_dbg_flag_vhci_rx) usbip_dump_urb(urb); - usbip_dbg_vhci_rx("now giveback urb %p\n", urb); + usbip_dbg_vhci_rx("now giveback urb %u\n", pdu->base.seqnum); spin_lock_irqsave(&vhci->lock, flags); usb_hcd_unlink_urb_from_ep(vhci_hcd_to_hcd(vhci_hcd), urb); @@ -172,7 +171,7 @@ static void vhci_recv_ret_unlink(struct vhci_device *vdev, pr_info("the urb (seqnum %d) was already given back\n", pdu->base.seqnum); } else { - usbip_dbg_vhci_rx("now giveback urb %p\n", urb); + usbip_dbg_vhci_rx("now giveback urb %d\n", pdu->base.seqnum); /* If unlink is successful, status is -ECONNRESET */ urb->status = pdu->u.ret_unlink.status; diff --git a/drivers/usb/usbip/vhci_tx.c b/drivers/usb/usbip/vhci_tx.c index 3e7878fe2fd4..a9a663a578b6 100644 --- a/drivers/usb/usbip/vhci_tx.c +++ b/drivers/usb/usbip/vhci_tx.c @@ -83,7 +83,8 @@ static int vhci_send_cmd_submit(struct vhci_device *vdev) memset(&msg, 0, sizeof(msg)); memset(&iov, 0, sizeof(iov)); - usbip_dbg_vhci_tx("setup txdata urb %p\n", urb); + usbip_dbg_vhci_tx("setup txdata urb seqnum %lu\n", + priv->seqnum); /* 1. setup usbip_header */ setup_cmd_submit_pdu(&pdu_header, urb); From 1fcd9859a4b3f1df7316ff01a9993695ab929727 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Mon, 13 Nov 2017 11:12:58 +0100 Subject: [PATCH 0927/1013] USB: chipidea: msm: fix ulpi-node lookup commit 964728f9f407eca0b417fdf8e784b7a76979490c upstream. Fix child-node lookup during probe, which ended up searching the whole device tree depth-first starting at the parent rather than just matching on its children. Note that the original premature free of the parent node has already been fixed separately, but that fix was apparently never backported to stable. Fixes: 47654a162081 ("usb: chipidea: msm: Restore wrapper settings after reset") Fixes: b74c43156c0c ("usb: chipidea: msm: ci_hdrc_msm_probe() missing of_node_get()") Cc: Stephen Boyd Cc: Frank Rowand Signed-off-by: Johan Hovold Signed-off-by: Peter Chen Signed-off-by: Greg Kroah-Hartman --- drivers/usb/chipidea/ci_hdrc_msm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/chipidea/ci_hdrc_msm.c b/drivers/usb/chipidea/ci_hdrc_msm.c index bb626120296f..53f3bf459dd1 100644 --- a/drivers/usb/chipidea/ci_hdrc_msm.c +++ b/drivers/usb/chipidea/ci_hdrc_msm.c @@ -251,7 +251,7 @@ static int ci_hdrc_msm_probe(struct platform_device *pdev) if (ret) goto err_mux; - ulpi_node = of_find_node_by_name(of_node_get(pdev->dev.of_node), "ulpi"); + ulpi_node = of_get_child_by_name(pdev->dev.of_node, "ulpi"); if (ulpi_node) { phy_node = of_get_next_available_child(ulpi_node, NULL); ci->hsic = of_device_is_compatible(phy_node, "qcom,usb-hsic-phy"); From ad61ff29f1049448fa389ba9ad464e49bdcdd61c Mon Sep 17 00:00:00 2001 From: Max Schulze Date: Wed, 20 Dec 2017 20:47:44 +0100 Subject: [PATCH 0928/1013] USB: serial: ftdi_sio: add id for Airbus DS P8GR commit c6a36ad383559a60a249aa6016cebf3cb8b6c485 upstream. Add AIRBUS_DS_P8GR device IDs to ftdi_sio driver. Signed-off-by: Max Schulze Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/ftdi_sio.c | 1 + drivers/usb/serial/ftdi_sio_ids.h | 6 ++++++ 2 files changed, 7 insertions(+) diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c index 49d1b2d4606d..d038e543c246 100644 --- a/drivers/usb/serial/ftdi_sio.c +++ b/drivers/usb/serial/ftdi_sio.c @@ -1017,6 +1017,7 @@ static const struct usb_device_id id_table_combined[] = { .driver_info = (kernel_ulong_t)&ftdi_jtag_quirk }, { USB_DEVICE(CYPRESS_VID, CYPRESS_WICED_BT_USB_PID) }, { USB_DEVICE(CYPRESS_VID, CYPRESS_WICED_WL_USB_PID) }, + { USB_DEVICE(AIRBUS_DS_VID, AIRBUS_DS_P8GR) }, { } /* Terminating entry */ }; diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h index 4faa09fe308c..8b4ecd2bd297 100644 --- a/drivers/usb/serial/ftdi_sio_ids.h +++ b/drivers/usb/serial/ftdi_sio_ids.h @@ -914,6 +914,12 @@ #define ICPDAS_I7561U_PID 0x0104 #define ICPDAS_I7563U_PID 0x0105 +/* + * Airbus Defence and Space + */ +#define AIRBUS_DS_VID 0x1e8e /* Vendor ID */ +#define AIRBUS_DS_P8GR 0x6001 /* Tetra P8GR */ + /* * RT Systems programming cables for various ham radios */ From eb6cc0af22a3b0c982e3a61de63736b2c667a94e Mon Sep 17 00:00:00 2001 From: Reinhard Speyerer Date: Fri, 15 Dec 2017 00:39:27 +0100 Subject: [PATCH 0929/1013] USB: serial: qcserial: add Sierra Wireless EM7565 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 92a18a657fb2e2ffbfa0659af32cc18fd2346516 upstream. Sierra Wireless EM7565 devices use the QCSERIAL_SWI layout for their serial ports T: Bus=01 Lev=03 Prnt=29 Port=01 Cnt=02 Dev#= 31 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1199 ProdID=9091 Rev= 0.06 S: Manufacturer=Sierra Wireless, Incorporated S: Product=Sierra Wireless EM7565 Qualcomm Snapdragon X16 LTE-A S: SerialNumber=xxxxxxxx C:* #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=qcserial E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=qcserial E: Ad=83(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=qcserial E: Ad=85(I) Atr=03(Int.) MxPS= 10 Ivl=32ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 8 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=86(I) Atr=03(Int.) MxPS= 8 Ivl=32ms E: Ad=8e(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=0f(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms but need sendsetup = true for the NMEA port to make it work properly. Simplify the patch compared to v1 as suggested by Bjørn Mork by taking advantage of the fact that existing devices work with sendsetup = true too. Use sendsetup = true for the NMEA interface of QCSERIAL_SWI and add DEVICE_SWI entries for the EM7565 PID 0x9091 and the EM7565 QDL PID 0x9090. Tests with several MC73xx/MC74xx/MC77xx devices have been performed in order to verify backward compatibility. Signed-off-by: Reinhard Speyerer Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/qcserial.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/serial/qcserial.c b/drivers/usb/serial/qcserial.c index 9f9d3a904464..55a8fb25ce2b 100644 --- a/drivers/usb/serial/qcserial.c +++ b/drivers/usb/serial/qcserial.c @@ -166,6 +166,8 @@ static const struct usb_device_id id_table[] = { {DEVICE_SWI(0x1199, 0x9079)}, /* Sierra Wireless EM74xx */ {DEVICE_SWI(0x1199, 0x907a)}, /* Sierra Wireless EM74xx QDL */ {DEVICE_SWI(0x1199, 0x907b)}, /* Sierra Wireless EM74xx */ + {DEVICE_SWI(0x1199, 0x9090)}, /* Sierra Wireless EM7565 QDL */ + {DEVICE_SWI(0x1199, 0x9091)}, /* Sierra Wireless EM7565 */ {DEVICE_SWI(0x413c, 0x81a2)}, /* Dell Wireless 5806 Gobi(TM) 4G LTE Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81a3)}, /* Dell Wireless 5570 HSPA+ (42Mbps) Mobile Broadband Card */ {DEVICE_SWI(0x413c, 0x81a4)}, /* Dell Wireless 5570e HSPA+ (42Mbps) Mobile Broadband Card */ @@ -346,6 +348,7 @@ static int qcprobe(struct usb_serial *serial, const struct usb_device_id *id) break; case 2: dev_dbg(dev, "NMEA GPS interface found\n"); + sendsetup = true; break; case 3: dev_dbg(dev, "Modem port found\n"); From 9fa3c3b5598eec2040fa466941a59bfc15f03f17 Mon Sep 17 00:00:00 2001 From: Daniele Palmas Date: Thu, 14 Dec 2017 16:54:45 +0100 Subject: [PATCH 0930/1013] USB: serial: option: add support for Telit ME910 PID 0x1101 commit 08933099e6404f588f81c2050bfec7313e06eeaf upstream. This patch adds support for PID 0x1101 of Telit ME910. Signed-off-by: Daniele Palmas Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 54e316b1892d..08684e0bc7ab 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -283,6 +283,7 @@ static void option_instat_callback(struct urb *urb); #define TELIT_PRODUCT_LE922_USBCFG3 0x1043 #define TELIT_PRODUCT_LE922_USBCFG5 0x1045 #define TELIT_PRODUCT_ME910 0x1100 +#define TELIT_PRODUCT_ME910_DUAL_MODEM 0x1101 #define TELIT_PRODUCT_LE920 0x1200 #define TELIT_PRODUCT_LE910 0x1201 #define TELIT_PRODUCT_LE910_USBCFG4 0x1206 @@ -648,6 +649,11 @@ static const struct option_blacklist_info telit_me910_blacklist = { .reserved = BIT(1) | BIT(3), }; +static const struct option_blacklist_info telit_me910_dual_modem_blacklist = { + .sendsetup = BIT(0), + .reserved = BIT(3), +}; + static const struct option_blacklist_info telit_le910_blacklist = { .sendsetup = BIT(0), .reserved = BIT(1) | BIT(2), @@ -1247,6 +1253,8 @@ static const struct usb_device_id option_ids[] = { .driver_info = (kernel_ulong_t)&telit_le922_blacklist_usbcfg0 }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910), .driver_info = (kernel_ulong_t)&telit_me910_blacklist }, + { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910_DUAL_MODEM), + .driver_info = (kernel_ulong_t)&telit_me910_dual_modem_blacklist }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910), .driver_info = (kernel_ulong_t)&telit_le910_blacklist }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_LE910_USBCFG4), From 57211f0cf174f8fdf3dffcff99fad25f376bb2b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?SZ=20Lin=20=28=E6=9E=97=E4=B8=8A=E6=99=BA=29?= Date: Tue, 19 Dec 2017 17:40:32 +0800 Subject: [PATCH 0931/1013] USB: serial: option: adding support for YUGA CLM920-NC5 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3920bb713038810f25770e7545b79f204685c8f2 upstream. This patch adds support for YUGA CLM920-NC5 PID 0x9625 USB modem to option driver. Interface layout: 0: QCDM/DIAG 1: ADB 2: MODEM 3: AT 4: RMNET Signed-off-by: Taiyi Wu Signed-off-by: SZ Lin (林上智) Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/usb/serial/option.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/usb/serial/option.c b/drivers/usb/serial/option.c index 08684e0bc7ab..a9400458ccea 100644 --- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -236,6 +236,8 @@ static void option_instat_callback(struct urb *urb); /* These Quectel products use Qualcomm's vendor ID */ #define QUECTEL_PRODUCT_UC20 0x9003 #define QUECTEL_PRODUCT_UC15 0x9090 +/* These Yuga products use Qualcomm's vendor ID */ +#define YUGA_PRODUCT_CLM920_NC5 0x9625 #define QUECTEL_VENDOR_ID 0x2c7c /* These Quectel products use Quectel's vendor ID */ @@ -683,6 +685,10 @@ static const struct option_blacklist_info cinterion_rmnet2_blacklist = { .reserved = BIT(4) | BIT(5), }; +static const struct option_blacklist_info yuga_clm920_nc5_blacklist = { + .reserved = BIT(1) | BIT(4), +}; + static const struct usb_device_id option_ids[] = { { USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_COLT) }, { USB_DEVICE(OPTION_VENDOR_ID, OPTION_PRODUCT_RICOLA) }, @@ -1187,6 +1193,9 @@ static const struct usb_device_id option_ids[] = { { USB_DEVICE(QUALCOMM_VENDOR_ID, QUECTEL_PRODUCT_UC15)}, { USB_DEVICE(QUALCOMM_VENDOR_ID, QUECTEL_PRODUCT_UC20), .driver_info = (kernel_ulong_t)&net_intf4_blacklist }, + /* Yuga products use Qualcomm vendor ID */ + { USB_DEVICE(QUALCOMM_VENDOR_ID, YUGA_PRODUCT_CLM920_NC5), + .driver_info = (kernel_ulong_t)&yuga_clm920_nc5_blacklist }, /* Quectel products using Quectel vendor ID */ { USB_DEVICE(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC21), .driver_info = (kernel_ulong_t)&net_intf4_blacklist }, From b9d02d3c5899b0abcb2df72252d7fc2f57028bfb Mon Sep 17 00:00:00 2001 From: Dmitry Fleytman Dmitry Fleytman Date: Tue, 19 Dec 2017 06:02:04 +0200 Subject: [PATCH 0932/1013] usb: Add device quirk for Logitech HD Pro Webcam C925e commit 7f038d256c723dd390d2fca942919573995f4cfd upstream. Commit e0429362ab15 ("usb: Add device quirk for Logitech HD Pro Webcams C920 and C930e") introduced quirk to workaround an issue with some Logitech webcams. There is one more model that has the same issue - C925e, so applying the same quirk as well. See aforementioned commit message for detailed explanation of the problem. Signed-off-by: Dmitry Fleytman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/quirks.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 50010282c010..60674a932c77 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -57,10 +57,11 @@ static const struct usb_device_id usb_quirk_list[] = { /* Microsoft LifeCam-VX700 v2.0 */ { USB_DEVICE(0x045e, 0x0770), .driver_info = USB_QUIRK_RESET_RESUME }, - /* Logitech HD Pro Webcams C920, C920-C and C930e */ + /* Logitech HD Pro Webcams C920, C920-C, C925e and C930e */ { USB_DEVICE(0x046d, 0x082d), .driver_info = USB_QUIRK_DELAY_INIT }, { USB_DEVICE(0x046d, 0x0841), .driver_info = USB_QUIRK_DELAY_INIT }, { USB_DEVICE(0x046d, 0x0843), .driver_info = USB_QUIRK_DELAY_INIT }, + { USB_DEVICE(0x046d, 0x085b), .driver_info = USB_QUIRK_DELAY_INIT }, /* Logitech ConferenceCam CC3000e */ { USB_DEVICE(0x046d, 0x0847), .driver_info = USB_QUIRK_DELAY_INIT }, From e2f33e5983cb1609805e267a3d2ef46d78257b2c Mon Sep 17 00:00:00 2001 From: Oliver Neukum Date: Tue, 12 Dec 2017 16:11:30 +0100 Subject: [PATCH 0933/1013] usb: add RESET_RESUME for ELSA MicroLink 56K commit b9096d9f15c142574ebebe8fbb137012bb9d99c2 upstream. This modem needs this quirk to operate. It produces timeouts when resumed without reset. Signed-off-by: Oliver Neukum Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c index 60674a932c77..c05c4f877750 100644 --- a/drivers/usb/core/quirks.c +++ b/drivers/usb/core/quirks.c @@ -155,6 +155,9 @@ static const struct usb_device_id usb_quirk_list[] = { /* Genesys Logic hub, internally used by KY-688 USB 3.1 Type-C Hub */ { USB_DEVICE(0x05e3, 0x0612), .driver_info = USB_QUIRK_NO_LPM }, + /* ELSA MicroLink 56K */ + { USB_DEVICE(0x05cc, 0x2267), .driver_info = USB_QUIRK_RESET_RESUME }, + /* Genesys Logic hub, internally used by Moshi USB to Ethernet Adapter */ { USB_DEVICE(0x05e3, 0x0616), .driver_info = USB_QUIRK_NO_LPM }, From c7c4e00a66081ad6161f009a0c329b9a36085766 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Tue, 19 Dec 2017 11:14:42 +0200 Subject: [PATCH 0934/1013] USB: Fix off by one in type-specific length check of BOS SSP capability commit 07b9f12864d16c3a861aef4817eb1efccbc5d0e6 upstream. USB 3.1 devices are not detected as 3.1 capable since 4.15-rc3 due to a off by one in commit 81cf4a45360f ("USB: core: Add type-specific length check of BOS descriptors") It uses USB_DT_USB_SSP_CAP_SIZE() to get SSP capability size which takes the zero based SSAC as argument, not the actual count of sublink speed attributes. USB3 spec 9.6.2.5 says "The number of Sublink Speed Attributes = SSAC + 1." The type-specific length check patch was added to stable and needs to be fixed there as well Fixes: 81cf4a45360f ("USB: core: Add type-specific length check of BOS descriptors") CC: Masakazu Mokuno Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/core/config.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/core/config.c b/drivers/usb/core/config.c index 843ef46d2537..9e3355b97396 100644 --- a/drivers/usb/core/config.c +++ b/drivers/usb/core/config.c @@ -1007,7 +1007,7 @@ int usb_get_bos_descriptor(struct usb_device *dev) case USB_SSP_CAP_TYPE: ssp_cap = (struct usb_ssp_cap_descriptor *)buffer; ssac = (le32_to_cpu(ssp_cap->bmAttributes) & - USB_SSP_SUBLINK_SPEED_ATTRIBS) + 1; + USB_SSP_SUBLINK_SPEED_ATTRIBS); if (length >= USB_DT_USB_SSP_CAP_SIZE(ssac)) dev->bos->ssp_cap = ssp_cap; break; From f0f18aa8f70171f3e184b4185e673d7d4694300e Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Thu, 21 Dec 2017 15:06:15 +0200 Subject: [PATCH 0935/1013] usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201 commit da99706689481717998d1d48edd389f339eea979 upstream. When plugging in a USB webcam I see the following message: xhci_hcd 0000:04:00.0: WARN Successful completion on short TX: needs XHCI_TRUST_TX_LENGTH quirk? handle_tx_event: 913 callbacks suppressed All is quiet again with this patch (and I've done a fair but of soak testing with the camera since). Signed-off-by: Daniel Thompson Acked-by: Ard Biesheuvel Signed-off-by: Mathias Nyman Signed-off-by: Greg Kroah-Hartman --- drivers/usb/host/xhci-pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 76f392954733..abb8f19ae40f 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -189,6 +189,9 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_TRUST_TX_LENGTH; xhci->quirks |= XHCI_BROKEN_STREAMS; } + if (pdev->vendor == PCI_VENDOR_ID_RENESAS && + pdev->device == 0x0014) + xhci->quirks |= XHCI_TRUST_TX_LENGTH; if (pdev->vendor == PCI_VENDOR_ID_RENESAS && pdev->device == 0x0015) xhci->quirks |= XHCI_RESET_ON_RESUME; From e4fb2e7e92ec638f0bd489c56b4b6d9004fd1f40 Mon Sep 17 00:00:00 2001 From: Anna-Maria Gleixner Date: Fri, 22 Dec 2017 15:51:12 +0100 Subject: [PATCH 0936/1013] timers: Use deferrable base independent of base::nohz_active commit ced6d5c11d3e7b342f1a80f908e6756ebd4b8ddd upstream. During boot and before base::nohz_active is set in the timer bases, deferrable timers are enqueued into the standard timer base. This works correctly as long as base::nohz_active is false. Once it base::nohz_active is set and a timer which was enqueued before that is accessed the lock selector code choses the lock of the deferred base. This causes unlocked access to the standard base and in case the timer is removed it does not clear the pending flag in the standard base bitmap which causes get_next_timer_interrupt() to return bogus values. To prevent that, the deferrable timers must be enqueued in the deferrable base, even when base::nohz_active is not set. Those deferrable timers also need to be expired unconditional. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Signed-off-by: Anna-Maria Gleixner Signed-off-by: Thomas Gleixner Reviewed-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Sebastian Siewior Cc: rt@linutronix.de Cc: Paul McKenney Link: https://lkml.kernel.org/r/20171222145337.633328378@linutronix.de Signed-off-by: Greg Kroah-Hartman --- kernel/time/timer.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index f2674a056c26..fdfaf4f3bcfa 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -814,11 +814,10 @@ static inline struct timer_base *get_timer_cpu_base(u32 tflags, u32 cpu) struct timer_base *base = per_cpu_ptr(&timer_bases[BASE_STD], cpu); /* - * If the timer is deferrable and nohz is active then we need to use - * the deferrable base. + * If the timer is deferrable and NO_HZ_COMMON is set then we need + * to use the deferrable base. */ - if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active && - (tflags & TIMER_DEFERRABLE)) + if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE)) base = per_cpu_ptr(&timer_bases[BASE_DEF], cpu); return base; } @@ -828,11 +827,10 @@ static inline struct timer_base *get_timer_this_cpu_base(u32 tflags) struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]); /* - * If the timer is deferrable and nohz is active then we need to use - * the deferrable base. + * If the timer is deferrable and NO_HZ_COMMON is set then we need + * to use the deferrable base. */ - if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active && - (tflags & TIMER_DEFERRABLE)) + if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && (tflags & TIMER_DEFERRABLE)) base = this_cpu_ptr(&timer_bases[BASE_DEF]); return base; } @@ -1644,7 +1642,7 @@ static __latent_entropy void run_timer_softirq(struct softirq_action *h) base->must_forward_clk = false; __run_timers(base); - if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active) + if (IS_ENABLED(CONFIG_NO_HZ_COMMON)) __run_timers(this_cpu_ptr(&timer_bases[BASE_DEF])); } From 8f1aa64ab08621cc3d2b1600ab80da5d3128abb6 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 22 Dec 2017 15:51:14 +0100 Subject: [PATCH 0937/1013] timers: Invoke timer_start_debug() where it makes sense commit fd45bb77ad682be728d1002431d77b8c73342836 upstream. The timer start debug function is called before the proper timer base is set. As a consequence the trace data contains the stale CPU and flags values. Call the debug function after setting the new base and flags. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Sebastian Siewior Cc: rt@linutronix.de Cc: Paul McKenney Cc: Anna-Maria Gleixner Link: https://lkml.kernel.org/r/20171222145337.792907137@linutronix.de Signed-off-by: Greg Kroah-Hartman --- kernel/time/timer.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/time/timer.c b/kernel/time/timer.c index fdfaf4f3bcfa..a4d095e1010e 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -982,8 +982,6 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) if (!ret && pending_only) goto out_unlock; - debug_activate(timer, expires); - new_base = get_target_base(base, timer->flags); if (base != new_base) { @@ -1007,6 +1005,8 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only) } } + debug_activate(timer, expires); + timer->expires = expires; /* * If 'idx' was calculated above and the base time did not advance From 6fae6de72ad44e98b5ae58a662d110c58594aad9 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 27 Dec 2017 21:37:25 +0100 Subject: [PATCH 0938/1013] timers: Reinitialize per cpu bases on hotplug commit 26456f87aca7157c057de65c9414b37f1ab881d1 upstream. The timer wheel bases are not (re)initialized on CPU hotplug. That leaves them with a potentially stale clk and next_expiry valuem, which can cause trouble then the CPU is plugged. Add a prepare callback which forwards the clock, sets next_expiry to far in the future and reset the control flags to a known state. Set base->must_forward_clk so the first timer which is queued will try to forward the clock to current jiffies. Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel") Reported-by: Paul E. McKenney Signed-off-by: Thomas Gleixner Cc: Peter Zijlstra Cc: Frederic Weisbecker Cc: Sebastian Siewior Cc: Anna-Maria Gleixner Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272152200.2431@nanos Signed-off-by: Greg Kroah-Hartman --- include/linux/cpuhotplug.h | 2 +- include/linux/timer.h | 4 +++- kernel/cpu.c | 4 ++-- kernel/time/timer.c | 15 +++++++++++++++ 4 files changed, 21 insertions(+), 4 deletions(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 2477a5cb5bd5..fb83dee528a1 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -86,7 +86,7 @@ enum cpuhp_state { CPUHP_MM_ZSWP_POOL_PREPARE, CPUHP_KVM_PPC_BOOK3S_PREPARE, CPUHP_ZCOMP_PREPARE, - CPUHP_TIMERS_DEAD, + CPUHP_TIMERS_PREPARE, CPUHP_MIPS_SOC_PREPARE, CPUHP_BP_PREPARE_DYN, CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20, diff --git a/include/linux/timer.h b/include/linux/timer.h index ac66f29c6916..e0ea1fe87572 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -246,9 +246,11 @@ unsigned long round_jiffies_up(unsigned long j); unsigned long round_jiffies_up_relative(unsigned long j); #ifdef CONFIG_HOTPLUG_CPU +int timers_prepare_cpu(unsigned int cpu); int timers_dead_cpu(unsigned int cpu); #else -#define timers_dead_cpu NULL +#define timers_prepare_cpu NULL +#define timers_dead_cpu NULL #endif #endif diff --git a/kernel/cpu.c b/kernel/cpu.c index 7891aecc6aec..f21bfa3172d8 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -1277,9 +1277,9 @@ static struct cpuhp_step cpuhp_bp_states[] = { * before blk_mq_queue_reinit_notify() from notify_dead(), * otherwise a RCU stall occurs. */ - [CPUHP_TIMERS_DEAD] = { + [CPUHP_TIMERS_PREPARE] = { .name = "timers:dead", - .startup.single = NULL, + .startup.single = timers_prepare_cpu, .teardown.single = timers_dead_cpu, }, /* Kicks the plugged cpu into life */ diff --git a/kernel/time/timer.c b/kernel/time/timer.c index a4d095e1010e..73e3cdbc61f1 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -1801,6 +1801,21 @@ static void migrate_timer_list(struct timer_base *new_base, struct hlist_head *h } } +int timers_prepare_cpu(unsigned int cpu) +{ + struct timer_base *base; + int b; + + for (b = 0; b < NR_BASES; b++) { + base = per_cpu_ptr(&timer_bases[b], cpu); + base->clk = jiffies; + base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA; + base->is_idle = false; + base->must_forward_clk = true; + } + return 0; +} + int timers_dead_cpu(unsigned int cpu) { struct timer_base *old_base; From d87f1bc7d15b89bd3bcf31020eb7f3b3cd6f84b5 Mon Sep 17 00:00:00 2001 From: Todd Kjos Date: Mon, 27 Nov 2017 09:32:33 -0800 Subject: [PATCH 0939/1013] binder: fix proc->files use-after-free commit 7f3dc0088b98533f17128058fac73cd8b2752ef1 upstream. proc->files cleanup is initiated by binder_vma_close. Therefore a reference on the binder_proc is not enough to prevent the files_struct from being released while the binder_proc still has a reference. This can lead to an attempt to dereference the stale pointer obtained from proc->files prior to proc->files cleanup. This has been seen once in task_get_unused_fd_flags() when __alloc_fd() is called with a stale "files". The fix is to protect proc->files with a mutex to prevent cleanup while in use. Signed-off-by: Todd Kjos Signed-off-by: Greg Kroah-Hartman --- drivers/android/binder.c | 44 ++++++++++++++++++++++++++++------------ 1 file changed, 31 insertions(+), 13 deletions(-) diff --git a/drivers/android/binder.c b/drivers/android/binder.c index 88b4bbe58100..a340766b51fe 100644 --- a/drivers/android/binder.c +++ b/drivers/android/binder.c @@ -482,7 +482,8 @@ enum binder_deferred_state { * @tsk task_struct for group_leader of process * (invariant after initialized) * @files files_struct for process - * (invariant after initialized) + * (protected by @files_lock) + * @files_lock mutex to protect @files * @deferred_work_node: element for binder_deferred_list * (protected by binder_deferred_lock) * @deferred_work: bitmap of deferred work to perform @@ -530,6 +531,7 @@ struct binder_proc { int pid; struct task_struct *tsk; struct files_struct *files; + struct mutex files_lock; struct hlist_node deferred_work_node; int deferred_work; bool is_dead; @@ -877,20 +879,26 @@ static void binder_inc_node_tmpref_ilocked(struct binder_node *node); static int task_get_unused_fd_flags(struct binder_proc *proc, int flags) { - struct files_struct *files = proc->files; unsigned long rlim_cur; unsigned long irqs; + int ret; - if (files == NULL) - return -ESRCH; - - if (!lock_task_sighand(proc->tsk, &irqs)) - return -EMFILE; - + mutex_lock(&proc->files_lock); + if (proc->files == NULL) { + ret = -ESRCH; + goto err; + } + if (!lock_task_sighand(proc->tsk, &irqs)) { + ret = -EMFILE; + goto err; + } rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE); unlock_task_sighand(proc->tsk, &irqs); - return __alloc_fd(files, 0, rlim_cur, flags); + ret = __alloc_fd(proc->files, 0, rlim_cur, flags); +err: + mutex_unlock(&proc->files_lock); + return ret; } /* @@ -899,8 +907,10 @@ static int task_get_unused_fd_flags(struct binder_proc *proc, int flags) static void task_fd_install( struct binder_proc *proc, unsigned int fd, struct file *file) { + mutex_lock(&proc->files_lock); if (proc->files) __fd_install(proc->files, fd, file); + mutex_unlock(&proc->files_lock); } /* @@ -910,9 +920,11 @@ static long task_close_fd(struct binder_proc *proc, unsigned int fd) { int retval; - if (proc->files == NULL) - return -ESRCH; - + mutex_lock(&proc->files_lock); + if (proc->files == NULL) { + retval = -ESRCH; + goto err; + } retval = __close_fd(proc->files, fd); /* can't restart close syscall because file table entry was cleared */ if (unlikely(retval == -ERESTARTSYS || @@ -920,7 +932,8 @@ static long task_close_fd(struct binder_proc *proc, unsigned int fd) retval == -ERESTARTNOHAND || retval == -ERESTART_RESTARTBLOCK)) retval = -EINTR; - +err: + mutex_unlock(&proc->files_lock); return retval; } @@ -4627,7 +4640,9 @@ static int binder_mmap(struct file *filp, struct vm_area_struct *vma) ret = binder_alloc_mmap_handler(&proc->alloc, vma); if (ret) return ret; + mutex_lock(&proc->files_lock); proc->files = get_files_struct(current); + mutex_unlock(&proc->files_lock); return 0; err_bad_arg: @@ -4651,6 +4666,7 @@ static int binder_open(struct inode *nodp, struct file *filp) spin_lock_init(&proc->outer_lock); get_task_struct(current->group_leader); proc->tsk = current->group_leader; + mutex_init(&proc->files_lock); INIT_LIST_HEAD(&proc->todo); proc->default_priority = task_nice(current); binder_dev = container_of(filp->private_data, struct binder_device, @@ -4903,9 +4919,11 @@ static void binder_deferred_func(struct work_struct *work) files = NULL; if (defer & BINDER_DEFERRED_PUT_FILES) { + mutex_lock(&proc->files_lock); files = proc->files; if (files) proc->files = NULL; + mutex_unlock(&proc->files_lock); } if (defer & BINDER_DEFERRED_FLUSH) From f3f5fa872d09109edfd7c10c57865301fee396d4 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Wed, 15 Nov 2017 10:43:16 +0100 Subject: [PATCH 0940/1013] phy: tegra: fix device-tree node lookups commit 046046737bd35bed047460f080ea47e186be731e upstream. Fix child-node lookups during probe, which ended up searching the whole device tree depth-first starting at the parents rather than just matching on their children. To make things worse, some parent nodes could end up being being prematurely freed (by tegra_xusb_pad_register()) as of_find_node_by_name() drops a reference to its first argument. Fixes: 53d2a715c240 ("phy: Add Tegra XUSB pad controller support") Cc: Thierry Reding Signed-off-by: Johan Hovold Signed-off-by: Kishon Vijay Abraham I Signed-off-by: Greg Kroah-Hartman --- drivers/phy/tegra/xusb.c | 58 ++++++++++++++++++++-------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c index 4307bf0013e1..63e916d4d069 100644 --- a/drivers/phy/tegra/xusb.c +++ b/drivers/phy/tegra/xusb.c @@ -75,14 +75,14 @@ MODULE_DEVICE_TABLE(of, tegra_xusb_padctl_of_match); static struct device_node * tegra_xusb_find_pad_node(struct tegra_xusb_padctl *padctl, const char *name) { - /* - * of_find_node_by_name() drops a reference, so make sure to grab one. - */ - struct device_node *np = of_node_get(padctl->dev->of_node); + struct device_node *pads, *np; + + pads = of_get_child_by_name(padctl->dev->of_node, "pads"); + if (!pads) + return NULL; - np = of_find_node_by_name(np, "pads"); - if (np) - np = of_find_node_by_name(np, name); + np = of_get_child_by_name(pads, name); + of_node_put(pads); return np; } @@ -90,16 +90,16 @@ tegra_xusb_find_pad_node(struct tegra_xusb_padctl *padctl, const char *name) static struct device_node * tegra_xusb_pad_find_phy_node(struct tegra_xusb_pad *pad, unsigned int index) { - /* - * of_find_node_by_name() drops a reference, so make sure to grab one. - */ - struct device_node *np = of_node_get(pad->dev.of_node); + struct device_node *np, *lanes; - np = of_find_node_by_name(np, "lanes"); - if (!np) + lanes = of_get_child_by_name(pad->dev.of_node, "lanes"); + if (!lanes) return NULL; - return of_find_node_by_name(np, pad->soc->lanes[index].name); + np = of_get_child_by_name(lanes, pad->soc->lanes[index].name); + of_node_put(lanes); + + return np; } static int @@ -195,7 +195,7 @@ int tegra_xusb_pad_register(struct tegra_xusb_pad *pad, unsigned int i; int err; - children = of_find_node_by_name(pad->dev.of_node, "lanes"); + children = of_get_child_by_name(pad->dev.of_node, "lanes"); if (!children) return -ENODEV; @@ -444,21 +444,21 @@ static struct device_node * tegra_xusb_find_port_node(struct tegra_xusb_padctl *padctl, const char *type, unsigned int index) { - /* - * of_find_node_by_name() drops a reference, so make sure to grab one. - */ - struct device_node *np = of_node_get(padctl->dev->of_node); + struct device_node *ports, *np; + char *name; - np = of_find_node_by_name(np, "ports"); - if (np) { - char *name; + ports = of_get_child_by_name(padctl->dev->of_node, "ports"); + if (!ports) + return NULL; - name = kasprintf(GFP_KERNEL, "%s-%u", type, index); - if (!name) - return ERR_PTR(-ENOMEM); - np = of_find_node_by_name(np, name); - kfree(name); + name = kasprintf(GFP_KERNEL, "%s-%u", type, index); + if (!name) { + of_node_put(ports); + return ERR_PTR(-ENOMEM); } + np = of_get_child_by_name(ports, name); + kfree(name); + of_node_put(ports); return np; } @@ -847,7 +847,7 @@ static void tegra_xusb_remove_ports(struct tegra_xusb_padctl *padctl) static int tegra_xusb_padctl_probe(struct platform_device *pdev) { - struct device_node *np = of_node_get(pdev->dev.of_node); + struct device_node *np = pdev->dev.of_node; const struct tegra_xusb_padctl_soc *soc; struct tegra_xusb_padctl *padctl; const struct of_device_id *match; @@ -855,7 +855,7 @@ static int tegra_xusb_padctl_probe(struct platform_device *pdev) int err; /* for backwards compatibility with old device trees */ - np = of_find_node_by_name(np, "pads"); + np = of_get_child_by_name(np, "pads"); if (!np) { dev_warn(&pdev->dev, "deprecated DT, using legacy driver\n"); return tegra_xusb_padctl_legacy_probe(pdev); From 2695c0f1f71e6022a6916584a4d19de8968e97c3 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Fri, 17 Nov 2017 11:56:41 +0000 Subject: [PATCH 0941/1013] drivers: base: cacheinfo: fix cache type for non-architected system cache commit f57ab9a01a36ef3454333251cc57e3a9948b17bf upstream. Commit dfea747d2aba ("drivers: base: cacheinfo: support DT overrides for cache properties") doesn't initialise the cache type if it's present only in DT and the architecture is not aware of it. They are unified system level cache which are generally transparent. This patch check if the cache type is set to NOCACHE but the DT node indicates that it's unified cache and sets the cache type accordingly. Fixes: dfea747d2aba ("drivers: base: cacheinfo: support DT overrides for cache properties") Reported-and-tested-by: Tan Xiaojun Cc: Greg Kroah-Hartman Signed-off-by: Sudeep Holla Signed-off-by: Greg Kroah-Hartman --- drivers/base/cacheinfo.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/base/cacheinfo.c b/drivers/base/cacheinfo.c index eb3af2739537..07532d83be0b 100644 --- a/drivers/base/cacheinfo.c +++ b/drivers/base/cacheinfo.c @@ -186,6 +186,11 @@ static void cache_associativity(struct cacheinfo *this_leaf) this_leaf->ways_of_associativity = (size / nr_sets) / line_size; } +static bool cache_node_is_unified(struct cacheinfo *this_leaf) +{ + return of_property_read_bool(this_leaf->of_node, "cache-unified"); +} + static void cache_of_override_properties(unsigned int cpu) { int index; @@ -194,6 +199,14 @@ static void cache_of_override_properties(unsigned int cpu) for (index = 0; index < cache_leaves(cpu); index++) { this_leaf = this_cpu_ci->info_list + index; + /* + * init_cache_level must setup the cache level correctly + * overriding the architecturally specified levels, so + * if type is NONE at this stage, it should be unified + */ + if (this_leaf->type == CACHE_TYPE_NOCACHE && + cache_node_is_unified(this_leaf)) + this_leaf->type = CACHE_TYPE_UNIFIED; cache_size(this_leaf); cache_get_line_size(this_leaf); cache_nr_sets(this_leaf); From 7a3ce39c2bcac61abcb4650b33945335a02ef6cd Mon Sep 17 00:00:00 2001 From: Sushmita Susheelendra Date: Fri, 15 Dec 2017 13:59:13 -0700 Subject: [PATCH 0942/1013] staging: android: ion: Fix dma direction for dma_sync_sg_for_cpu/device commit d6b246bb7a29703f53aa4c050b8b3205d749caee upstream. Use the direction argument passed into begin_cpu_access and end_cpu_access when calling the dma_sync_sg_for_cpu/device. The actual cache primitive called depends on the direction passed in. Signed-off-by: Sushmita Susheelendra Acked-by: Laura Abbott Signed-off-by: Greg Kroah-Hartman --- drivers/staging/android/ion/ion.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/staging/android/ion/ion.c b/drivers/staging/android/ion/ion.c index 93e2c90fa77d..83dc3292e9ab 100644 --- a/drivers/staging/android/ion/ion.c +++ b/drivers/staging/android/ion/ion.c @@ -348,7 +348,7 @@ static int ion_dma_buf_begin_cpu_access(struct dma_buf *dmabuf, mutex_lock(&buffer->lock); list_for_each_entry(a, &buffer->attachments, list) { dma_sync_sg_for_cpu(a->dev, a->table->sgl, a->table->nents, - DMA_BIDIRECTIONAL); + direction); } mutex_unlock(&buffer->lock); @@ -370,7 +370,7 @@ static int ion_dma_buf_end_cpu_access(struct dma_buf *dmabuf, mutex_lock(&buffer->lock); list_for_each_entry(a, &buffer->attachments, list) { dma_sync_sg_for_device(a->dev, a->table->sgl, a->table->nents, - DMA_BIDIRECTIONAL); + direction); } mutex_unlock(&buffer->lock); From e798502cfb471b95153de2f75a89501c45ec997a Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 22 Dec 2017 15:51:13 +0100 Subject: [PATCH 0943/1013] nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick() commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream. The conditions in irq_exit() to invoke tick_nohz_irq_exit() which subsequently invokes tick_nohz_stop_sched_tick() are: if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu)) If need_resched() is not set, but a timer softirq is pending then this is an indication that the softirq code punted and delegated the execution to softirqd. need_resched() is not true because the current interrupted task takes precedence over softirqd. Invoking tick_nohz_irq_exit() in this case can cause an endless loop of timer interrupts because the timer wheel contains an expired timer, but softirqs are not yet executed. So it returns an immediate expiry request, which causes the timer to fire immediately again. Lather, rinse and repeat.... Prevent that by adding a check for a pending timer soft interrupt to the conditions in tick_nohz_stop_sched_tick() which avoid calling get_next_timer_interrupt(). That keeps the tick sched timer on the tick and prevents a repetitive programming of an already expired timer. Reported-by: Sebastian Siewior Signed-off-by: Thomas Gleixner Acked-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Paul McKenney Cc: Anna-Maria Gleixner Cc: Sebastian Siewior Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos Signed-off-by: Greg Kroah-Hartman --- kernel/time/tick-sched.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index e4d463adefc2..dfa4a117fee3 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -674,6 +674,11 @@ static void tick_nohz_restart(struct tick_sched *ts, ktime_t now) ts->next_tick = 0; } +static inline bool local_timer_softirq_pending(void) +{ + return local_softirq_pending() & TIMER_SOFTIRQ; +} + static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, ktime_t now, int cpu) { @@ -690,8 +695,18 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, } while (read_seqretry(&jiffies_lock, seq)); ts->last_jiffies = basejiff; - if (rcu_needs_cpu(basemono, &next_rcu) || - arch_needs_cpu() || irq_work_needs_cpu()) { + /* + * Keep the periodic tick, when RCU, architecture or irq_work + * requests it. + * Aside of that check whether the local timer softirq is + * pending. If so its a bad idea to call get_next_timer_interrupt() + * because there is an already expired timer, so it will request + * immeditate expiry, which rearms the hardware timer with a + * minimal delta which brings us back to this place + * immediately. Lather, rinse and repeat... + */ + if (rcu_needs_cpu(basemono, &next_rcu) || arch_needs_cpu() || + irq_work_needs_cpu() || local_timer_softirq_pending()) { next_tick = basemono + TICK_NSEC; } else { /* From b5bef29785ff990a90238d4f59ccbeb656134eeb Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sat, 30 Dec 2017 22:13:53 +0100 Subject: [PATCH 0944/1013] x86/smpboot: Remove stale TLB flush invocations commit 322f8b8b340c824aef891342b0f5795d15e11562 upstream. smpboot_setup_warm_reset_vector() and smpboot_restore_warm_reset_vector() invoke local_flush_tlb() for no obvious reason. Digging in history revealed that the original code in the 2.1 era added those because the code manipulated a swapper_pg_dir pagetable entry. The pagetable manipulation was removed long ago in the 2.3 timeframe, but the TLB flush invocations stayed around forever. Remove them along with the pointless pr_debug()s which come from the same 2.1 change. Reported-by: Dominik Brodowski Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Linus Torvalds Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171230211829.586548655@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/smpboot.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 12bf07d44dfe..2651ca2112c4 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -128,25 +128,16 @@ static inline void smpboot_setup_warm_reset_vector(unsigned long start_eip) spin_lock_irqsave(&rtc_lock, flags); CMOS_WRITE(0xa, 0xf); spin_unlock_irqrestore(&rtc_lock, flags); - local_flush_tlb(); - pr_debug("1.\n"); *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_HIGH)) = start_eip >> 4; - pr_debug("2.\n"); *((volatile unsigned short *)phys_to_virt(TRAMPOLINE_PHYS_LOW)) = start_eip & 0xf; - pr_debug("3.\n"); } static inline void smpboot_restore_warm_reset_vector(void) { unsigned long flags; - /* - * Install writable page 0 entry to set BIOS data area. - */ - local_flush_tlb(); - /* * Paranoid: Set warm reset code and vector here back * to default values. From 082b7521a541a66f1fed68b1f76de6a3b910495e Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sat, 30 Dec 2017 22:13:54 +0100 Subject: [PATCH 0945/1013] x86/mm: Remove preempt_disable/enable() from __native_flush_tlb() commit decab0888e6e14e11d53cefa85f8b3d3b45ce73c upstream. The preempt_disable/enable() pair in __native_flush_tlb() was added in commit: 5cf0791da5c1 ("x86/mm: Disable preemption during CR3 read+write") ... to protect the UP variant of flush_tlb_mm_range(). That preempt_disable/enable() pair should have been added to the UP variant of flush_tlb_mm_range() instead. The UP variant was removed with commit: ce4a4e565f52 ("x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code") ... but the preempt_disable/enable() pair stayed around. The latest change to __native_flush_tlb() in commit: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") ... added an access to a per CPU variable outside the preempt disabled regions, which makes no sense at all. __native_flush_tlb() must always be called with at least preemption disabled. Remove the preempt_disable/enable() pair and add a WARN_ON_ONCE() to catch bad callers independent of the smp_processor_id() debugging. Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Dominik Brodowski Cc: Linus Torvalds Cc: Linus Torvalds Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20171230211829.679325424@linutronix.de Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/tlbflush.h | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index b519da4fc03c..f9b48ce152eb 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -345,15 +345,17 @@ static inline void invalidate_user_asid(u16 asid) */ static inline void __native_flush_tlb(void) { - invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); /* - * If current->mm == NULL then we borrow a mm which may change - * during a task switch and therefore we must not be preempted - * while we write CR3 back: + * Preemption or interrupts must be disabled to protect the access + * to the per CPU variable and to prevent being preempted between + * read_cr3() and write_cr3(). */ - preempt_disable(); + WARN_ON_ONCE(preemptible()); + + invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + + /* If current->mm == NULL then the read_cr3() "borrows" an mm */ native_write_cr3(__native_read_cr3()); - preempt_enable(); } /* From 530f5fa1600b6af56b138f9fb87b7f5148c979a0 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 27 Dec 2017 11:48:50 -0800 Subject: [PATCH 0946/1013] x86-32: Fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR) commit ac461122c88a10b7d775de2f56467f097c9e627a upstream. Commit e802a51ede91 ("x86/idt: Consolidate IDT invalidation") cleaned up and unified the IDT invalidation that existed in a couple of places. It changed no actual real code. Despite not changing any actual real code, it _did_ change code generation: by implementing the common idt_invalidate() function in archx86/kernel/idt.c, it made the use of the function in arch/x86/kernel/machine_kexec_32.c be a real function call rather than an (accidental) inlining of the function. That, in turn, exposed two issues: - in load_segments(), we had incorrectly reset all the segment registers, which then made the stack canary load (which gcc does using offset of %gs) cause a trap. Instead of %gs pointing to the stack canary, it will be the normal zero-based kernel segment, and the stack canary load will take a page fault at address 0x14. - to make this even harder to debug, we had invalidated the GDT just before calling idt_invalidate(), which meant that the fault happened with an invalid GDT, which in turn causes a triple fault and immediate reboot. Fix this by (a) not reloading the special segments in load_segments(). We currently don't do any percpu accesses (which would require %fs on x86-32) in this area, but there's no reason to think that we might not want to do them, and like %gs, it's pointless to break it. (b) doing idt_invalidate() before invalidating the GDT, to keep things at least _slightly_ more debuggable for a bit longer. Without a IDT, traps will not work. Without a GDT, traps also will not work, but neither will any segment loads etc. So in a very real sense, the GDT is even more core than the IDT. Fixes: e802a51ede91 ("x86/idt: Consolidate IDT invalidation") Reported-and-tested-by: Alexandru Chirvasitu Signed-off-by: Linus Torvalds Signed-off-by: Thomas Gleixner Cc: Denys Vlasenko Cc: Peter Zijlstra Cc: Brian Gerst Cc: Steven Rostedt Cc: Borislav Petkov Cc: Andy Lutomirski Cc: Josh Poimboeuf Link: https://lkml.kernel.org/r/alpine.LFD.2.21.1712271143180.8572@i7.lan Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/machine_kexec_32.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kernel/machine_kexec_32.c b/arch/x86/kernel/machine_kexec_32.c index 00bc751c861c..edfede768688 100644 --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -48,8 +48,6 @@ static void load_segments(void) "\tmovl $"STR(__KERNEL_DS)",%%eax\n" "\tmovl %%eax,%%ds\n" "\tmovl %%eax,%%es\n" - "\tmovl %%eax,%%fs\n" - "\tmovl %%eax,%%gs\n" "\tmovl %%eax,%%ss\n" : : : "eax", "memory"); #undef STR @@ -232,8 +230,8 @@ void machine_kexec(struct kimage *image) * The gdt & idt are now invalid. * If you want to load them you must set up your own idt & gdt. */ - set_gdt(phys_to_virt(0), 0); idt_invalidate(phys_to_virt(0)); + set_gdt(phys_to_virt(0), 0); /* now call it */ image->start = relocate_kernel_ptr((unsigned long)image->head, From cf6c3f7f4b1305596e261fb90201b737ad79e0d6 Mon Sep 17 00:00:00 2001 From: Andy Lutomirski Date: Tue, 12 Dec 2017 07:56:36 -0800 Subject: [PATCH 0947/1013] x86/espfix/64: Fix espfix double-fault handling on 5-level systems commit c739f930be1dd5fd949030e3475a884fe06dae9b upstream. Using PGDIR_SHIFT to identify espfix64 addresses on 5-level systems was wrong, and it resulted in panics due to unhandled double faults. Use P4D_SHIFT instead, which is correct on 4-level and 5-level machines. This fixes a panic when running x86 selftests on 5-level machines. Signed-off-by: Andy Lutomirski Acked-by: Kirill A. Shutemov Cc: Borislav Petkov Cc: Brian Gerst Cc: Dave Hansen Cc: David Laight Cc: Kees Cook Cc: Kirill A. Shutemov Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Fixes: 1d33b219563f ("x86/espfix: Add support for 5-level paging") Link: http://lkml.kernel.org/r/24c898b4f44fdf8c22d93703850fb384ef87cfdc.1513035461.git.luto@kernel.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/traps.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 7c16fe0b60c2..b33e860d32fe 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -361,7 +361,7 @@ dotraplinkage void do_double_fault(struct pt_regs *regs, long error_code) * * No need for ist_enter here because we don't use RCU. */ - if (((long)regs->sp >> PGDIR_SHIFT) == ESPFIX_PGD_ENTRY && + if (((long)regs->sp >> P4D_SHIFT) == ESPFIX_PGD_ENTRY && regs->cs == __KERNEL_CS && regs->ip == (unsigned long)native_irq_return_iret) { From 3e133155f22d59b0085ef63cbcbcfbd96c4d0b70 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 31 Dec 2017 11:24:34 +0100 Subject: [PATCH 0948/1013] x86/ldt: Plug memory leak in error path commit a62d69857aab4caa43049e72fe0ed5c4a60518dd upstream. The error path in write_ldt() tries to free 'old_ldt' instead of the newly allocated 'new_ldt', resulting in a memory leak. It also misses to clean up a half populated LDT pagetable, which is not a leak as it gets cleaned up when the process exits. Free both the potentially half populated LDT pagetable and the newly allocated LDT struct. This can be done unconditionally because once an LDT is mapped subsequent maps will succeed, because the PTE page is already populated and the two LDTs fit into that single page. Reported-by: Mathieu Desnoyers Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Borislav Petkov Cc: Dave Hansen Cc: Dominik Brodowski Cc: Linus Torvalds Cc: Linus Torvalds Cc: Peter Zijlstra Fixes: f55f0501cbf6 ("x86/pti: Put the LDT in its own PGD if PTI is on") Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1712311121340.1899@nanos Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/ldt.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 579cc4a66fdf..500e90e44f86 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -421,7 +421,13 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) */ error = map_ldt_struct(mm, new_ldt, old_ldt ? !old_ldt->slot : 0); if (error) { - free_ldt_struct(old_ldt); + /* + * This only can fail for the first LDT setup. If an LDT is + * already installed then the PTE page is already + * populated. Mop up a half populated page table. + */ + free_ldt_pgtables(mm); + free_ldt_struct(new_ldt); goto out_unlock; } From 57849de13c7da3385611a1b9aca5c6040ef7dfe2 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sun, 31 Dec 2017 16:52:15 +0100 Subject: [PATCH 0949/1013] x86/ldt: Make LDT pgtable free conditional commit 7f414195b0c3612acd12b4611a5fe75995cf10c7 upstream. Andy prefers to be paranoid about the pagetable free in the error path of write_ldt(). Make it conditional and warn whenever the installment of a secondary LDT fails. Requested-by: Andy Lutomirski Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/ldt.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/ldt.c b/arch/x86/kernel/ldt.c index 500e90e44f86..26d713ecad34 100644 --- a/arch/x86/kernel/ldt.c +++ b/arch/x86/kernel/ldt.c @@ -426,7 +426,8 @@ static int write_ldt(void __user *ptr, unsigned long bytecount, int oldmode) * already installed then the PTE page is already * populated. Mop up a half populated page table. */ - free_ldt_pgtables(mm); + if (!WARN_ON_ONCE(old_ldt)) + free_ldt_pgtables(mm); free_ldt_struct(new_ldt); goto out_unlock; } From aaa5a91ff744f91fb1d1c91853aa0c8f126be563 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 20 Dec 2017 17:57:06 -0800 Subject: [PATCH 0950/1013] n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD) commit 966031f340185eddd05affcf72b740549f056348 upstream. We added support for EXTPROC back in 2010 in commit 26df6d13406d ("tty: Add EXTPROC support for LINEMODE") and the intent was to allow it to override some (all?) ICANON behavior. Quoting from that original commit message: There is a new bit in the termios local flag word, EXTPROC. When this bit is set, several aspects of the terminal driver are disabled. Input line editing, character echo, and mapping of signals are all disabled. This allows the telnetd to turn off these functions when in linemode, but still keep track of what state the user wants the terminal to be in. but the problem turns out that "several aspects of the terminal driver are disabled" is a bit ambiguous, and you can really confuse the n_tty layer by setting EXTPROC and then causing some of the ICANON invariants to no longer be maintained. This fixes at least one such case (TIOCINQ) becoming unhappy because of the confusion over whether ICANON really means ICANON when EXTPROC is set. This basically makes TIOCINQ match the case of read: if EXTPROC is set, we ignore ICANON. Also, make sure to reset the ICANON state ie EXTPROC changes, not just if ICANON changes. Fixes: 26df6d13406d ("tty: Add EXTPROC support for LINEMODE") Reported-by: Tetsuo Handa Reported-by: syzkaller Cc: Jiri Slaby Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- drivers/tty/n_tty.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index bdf0e6e89991..faf50df81622 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1764,7 +1764,7 @@ static void n_tty_set_termios(struct tty_struct *tty, struct ktermios *old) { struct n_tty_data *ldata = tty->disc_data; - if (!old || (old->c_lflag ^ tty->termios.c_lflag) & ICANON) { + if (!old || (old->c_lflag ^ tty->termios.c_lflag) & (ICANON | EXTPROC)) { bitmap_zero(ldata->read_flags, N_TTY_BUF_SIZE); ldata->line_start = ldata->read_tail; if (!L_ICANON(tty) || !read_cnt(ldata)) { @@ -2427,7 +2427,7 @@ static int n_tty_ioctl(struct tty_struct *tty, struct file *file, return put_user(tty_chars_in_buffer(tty), (int __user *) arg); case TIOCINQ: down_write(&tty->termios_rwsem); - if (L_ICANON(tty)) + if (L_ICANON(tty) && !L_EXTPROC(tty)) retval = inq_canon(ldata); else retval = read_cnt(ldata); From 3ade66602bb7ea348ed8cbafcdd56eb1826c2177 Mon Sep 17 00:00:00 2001 From: Johan Hovold Date: Fri, 3 Nov 2017 15:18:05 +0100 Subject: [PATCH 0951/1013] tty: fix tty_ldisc_receive_buf() documentation commit e7e51dcf3b8a5f65c5653a054ad57eb2492a90d0 upstream. The tty_ldisc_receive_buf() helper returns the number of bytes processed so drop the bogus "not" from the kernel doc comment. Fixes: 8d082cd300ab ("tty: Unify receive_buf() code paths") Signed-off-by: Johan Hovold Signed-off-by: Greg Kroah-Hartman --- drivers/tty/tty_buffer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/tty/tty_buffer.c b/drivers/tty/tty_buffer.c index f8eba1c5412f..677fa99b7747 100644 --- a/drivers/tty/tty_buffer.c +++ b/drivers/tty/tty_buffer.c @@ -446,7 +446,7 @@ EXPORT_SYMBOL_GPL(tty_prepare_flip_string); * Callers other than flush_to_ldisc() need to exclude the kworker * from concurrent use of the line discipline, see paste_selection(). * - * Returns the number of bytes not processed + * Returns the number of bytes processed */ int tty_ldisc_receive_buf(struct tty_ldisc *ld, const unsigned char *p, char *f, int count) From 0d59679df5b53755c00ea0292df696f97bfc950d Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 2 Jan 2018 20:31:17 +0100 Subject: [PATCH 0952/1013] Linux 4.14.11 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 930fa93f397d..655887067dc7 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 10 +SUBLEVEL = 11 EXTRAVERSION = NAME = Petit Gorille From 611583d327961acad4405063fc7c690ac2cf21c3 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 2 Jan 2018 15:21:33 -0800 Subject: [PATCH 0953/1013] exec: Weaken dumpability for secureexec commit e816c201aed5232171f8eb80b5d46ae6516683b9 upstream. This is a logical revert of commit e37fdb785a5f ("exec: Use secureexec for setting dumpability") This weakens dumpability back to checking only for uid/gid changes in current (which is useless), but userspace depends on dumpability not being tied to secureexec. https://bugzilla.redhat.com/show_bug.cgi?id=1528633 Reported-by: Tom Horsley Fixes: e37fdb785a5f ("exec: Use secureexec for setting dumpability") Signed-off-by: Kees Cook Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/exec.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/exec.c b/fs/exec.c index 3e14ba25f678..acec119fcc31 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1350,9 +1350,14 @@ void setup_new_exec(struct linux_binprm * bprm) current->sas_ss_sp = current->sas_ss_size = 0; - /* Figure out dumpability. */ + /* + * Figure out dumpability. Note that this checking only of current + * is wrong, but userspace depends on it. This should be testing + * bprm->secureexec instead. + */ if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP || - bprm->secureexec) + !(uid_eq(current_euid(), current_uid()) && + gid_eq(current_egid(), current_gid()))) set_dumpable(current->mm, suid_dumpable); else set_dumpable(current->mm, SUID_DUMP_USER); From df4373c513b310d75aafeb8902917e7f8d0fe6a5 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 1 Jan 2018 09:28:31 -0600 Subject: [PATCH 0954/1013] capabilities: fix buffer overread on very short xattr commit dc32b5c3e6e2ef29cef76d9ce1b92d394446150e upstream. If userspace attempted to set a "security.capability" xattr shorter than 4 bytes (e.g. 'setfattr -n security.capability -v x file'), then cap_convert_nscap() read past the end of the buffer containing the xattr value because it accessed the ->magic_etc field without verifying that the xattr value is long enough to contain that field. Fix it by validating the xattr value size first. This bug was found using syzkaller with KASAN. The KASAN report was as follows (cleaned up slightly): BUG: KASAN: slab-out-of-bounds in cap_convert_nscap+0x514/0x630 security/commoncap.c:498 Read of size 4 at addr ffff88002d8741c0 by task syz-executor1/2852 CPU: 0 PID: 2852 Comm: syz-executor1 Not tainted 4.15.0-rc6-00200-gcc0aac99d977 #253 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0xe3/0x195 lib/dump_stack.c:53 print_address_description+0x73/0x260 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x235/0x350 mm/kasan/report.c:409 cap_convert_nscap+0x514/0x630 security/commoncap.c:498 setxattr+0x2bd/0x350 fs/xattr.c:446 path_setxattr+0x168/0x1b0 fs/xattr.c:472 SYSC_setxattr fs/xattr.c:487 [inline] SyS_setxattr+0x36/0x50 fs/xattr.c:483 entry_SYSCALL_64_fastpath+0x18/0x85 Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities") Signed-off-by: Eric Biggers Reviewed-by: Serge Hallyn Signed-off-by: James Morris Signed-off-by: Greg Kroah-Hartman --- security/commoncap.c | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) diff --git a/security/commoncap.c b/security/commoncap.c index fc46f5b85251..7b01431d1e19 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -348,21 +348,18 @@ static __u32 sansflags(__u32 m) return m & ~VFS_CAP_FLAGS_EFFECTIVE; } -static bool is_v2header(size_t size, __le32 magic) +static bool is_v2header(size_t size, const struct vfs_cap_data *cap) { - __u32 m = le32_to_cpu(magic); if (size != XATTR_CAPS_SZ_2) return false; - return sansflags(m) == VFS_CAP_REVISION_2; + return sansflags(le32_to_cpu(cap->magic_etc)) == VFS_CAP_REVISION_2; } -static bool is_v3header(size_t size, __le32 magic) +static bool is_v3header(size_t size, const struct vfs_cap_data *cap) { - __u32 m = le32_to_cpu(magic); - if (size != XATTR_CAPS_SZ_3) return false; - return sansflags(m) == VFS_CAP_REVISION_3; + return sansflags(le32_to_cpu(cap->magic_etc)) == VFS_CAP_REVISION_3; } /* @@ -405,7 +402,7 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer, fs_ns = inode->i_sb->s_user_ns; cap = (struct vfs_cap_data *) tmpbuf; - if (is_v2header((size_t) ret, cap->magic_etc)) { + if (is_v2header((size_t) ret, cap)) { /* If this is sizeof(vfs_cap_data) then we're ok with the * on-disk value, so return that. */ if (alloc) @@ -413,7 +410,7 @@ int cap_inode_getsecurity(struct inode *inode, const char *name, void **buffer, else kfree(tmpbuf); return ret; - } else if (!is_v3header((size_t) ret, cap->magic_etc)) { + } else if (!is_v3header((size_t) ret, cap)) { kfree(tmpbuf); return -EINVAL; } @@ -470,9 +467,9 @@ static kuid_t rootid_from_xattr(const void *value, size_t size, return make_kuid(task_ns, rootid); } -static bool validheader(size_t size, __le32 magic) +static bool validheader(size_t size, const struct vfs_cap_data *cap) { - return is_v2header(size, magic) || is_v3header(size, magic); + return is_v2header(size, cap) || is_v3header(size, cap); } /* @@ -495,7 +492,7 @@ int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size) if (!*ivalue) return -EINVAL; - if (!validheader(size, cap->magic_etc)) + if (!validheader(size, cap)) return -EINVAL; if (!capable_wrt_inode_uidgid(inode, CAP_SETFCAP)) return -EPERM; From 151d7039757b71ebd9d170af0944562f51149372 Mon Sep 17 00:00:00 2001 From: Tom Lendacky Date: Tue, 26 Dec 2017 23:43:54 -0600 Subject: [PATCH 0955/1013] x86/cpu, x86/pti: Do not enable PTI on AMD processors commit 694d99d40972f12e59a3696effee8a376b79d7c8 upstream. AMD processors are not subject to the types of attacks that the kernel page table isolation feature protects against. The AMD microarchitecture does not allow memory references, including speculative references, that access higher privileged data when running in a lesser privileged mode when that access would result in a page fault. Disable page table isolation by default on AMD processors by not setting the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI is set. Signed-off-by: Tom Lendacky Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Cc: Dave Hansen Cc: Andy Lutomirski Link: https://lkml.kernel.org/r/20171227054354.20369.94587.stgit@tlendack-t1.amdoffice.net Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/common.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index f2a94dfb434e..b1be494ab4e8 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -899,8 +899,8 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) setup_force_cpu_cap(X86_FEATURE_ALWAYS); - /* Assume for now that ALL x86 CPUs are insecure */ - setup_force_cpu_bug(X86_BUG_CPU_INSECURE); + if (c->x86_vendor != X86_VENDOR_AMD) + setup_force_cpu_bug(X86_BUG_CPU_INSECURE); fpu__init_system(c); From 211ad3fdf63383772d492add846da4b0b2266531 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 3 Jan 2018 15:57:59 +0100 Subject: [PATCH 0956/1013] x86/pti: Make sure the user/kernel PTEs match commit 52994c256df36fda9a715697431cba9daecb6b11 upstream. Meelis reported that his K8 Athlon64 emits MCE warnings when PTI is enabled: [Hardware Error]: Error Addr: 0x0000ffff81e000e0 [Hardware Error]: MC1 Error: L1 TLB multimatch. [Hardware Error]: cache level: L1, tx: INSN The address is in the entry area, which is mapped into kernel _AND_ user space. That's special because we switch CR3 while we are executing there. User mapping: 0xffffffff81e00000-0xffffffff82000000 2M ro PSE GLB x pmd Kernel mapping: 0xffffffff81000000-0xffffffff82000000 16M ro PSE x pmd So the K8 is complaining that the TLB entries differ. They differ in the GLB bit. Drop the GLB bit when installing the user shared mapping. Fixes: 6dc72c3cbca0 ("x86/mm/pti: Share entry text PMD") Reported-by: Meelis Roos Signed-off-by: Thomas Gleixner Tested-by: Meelis Roos Cc: Borislav Petkov Cc: Tom Lendacky Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031407180.1957@nanos Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/pti.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index bce8aea65606..2da28ba97508 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -367,7 +367,8 @@ static void __init pti_setup_espfix64(void) static void __init pti_clone_entry_text(void) { pti_clone_pmds((unsigned long) __entry_text_start, - (unsigned long) __irqentry_text_end, _PAGE_RW); + (unsigned long) __irqentry_text_end, + _PAGE_RW | _PAGE_GLOBAL); } /* From a841ba8f4d7e37acc9d46f982f418ee8e876a752 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Sun, 31 Dec 2017 10:18:06 -0600 Subject: [PATCH 0957/1013] x86/dumpstack: Fix partial register dumps MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit a9cdbe72c4e8bf3b38781c317a79326e2e1a230d upstream. The show_regs_safe() logic is wrong. When there's an iret stack frame, it prints the entire pt_regs -- most of which is random stack data -- instead of just the five registers at the end. show_regs_safe() is also poorly named: the on_stack() checks aren't for safety. Rename the function to show_regs_if_on_stack() and add a comment to explain why the checks are needed. These issues were introduced with the "partial register dump" feature of the following commit: b02fcf9ba121 ("x86/unwinder: Handle stack overflows more gracefully") That patch had gone through a few iterations of development, and the above issues were artifacts from a previous iteration of the patch where 'regs' pointed directly to the iret frame rather than to the (partially empty) pt_regs. Tested-by: Alexander Tsoy Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toralf Förster Fixes: b02fcf9ba121 ("x86/unwinder: Handle stack overflows more gracefully") Link: http://lkml.kernel.org/r/5b05b8b344f59db2d3d50dbdeba92d60f2304c54.1514736742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/unwind.h | 17 +++++++++++++---- arch/x86/kernel/dumpstack.c | 28 ++++++++++++++++++++-------- arch/x86/kernel/stacktrace.c | 2 +- 3 files changed, 34 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h index c1688c2d0a12..1f86e1b0a5cd 100644 --- a/arch/x86/include/asm/unwind.h +++ b/arch/x86/include/asm/unwind.h @@ -56,18 +56,27 @@ void unwind_start(struct unwind_state *state, struct task_struct *task, #if defined(CONFIG_UNWINDER_ORC) || defined(CONFIG_UNWINDER_FRAME_POINTER) /* - * WARNING: The entire pt_regs may not be safe to dereference. In some cases, - * only the iret frame registers are accessible. Use with caution! + * If 'partial' returns true, only the iret frame registers are valid. */ -static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) +static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state, + bool *partial) { if (unwind_done(state)) return NULL; + if (partial) { +#ifdef CONFIG_UNWINDER_ORC + *partial = !state->full_regs; +#else + *partial = false; +#endif + } + return state->regs; } #else -static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state) +static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state, + bool *partial) { return NULL; } diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index 5fa110699ed2..d0bb176a7261 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -76,12 +76,23 @@ void show_iret_regs(struct pt_regs *regs) regs->sp, regs->flags); } -static void show_regs_safe(struct stack_info *info, struct pt_regs *regs) +static void show_regs_if_on_stack(struct stack_info *info, struct pt_regs *regs, + bool partial) { - if (on_stack(info, regs, sizeof(*regs))) + /* + * These on_stack() checks aren't strictly necessary: the unwind code + * has already validated the 'regs' pointer. The checks are done for + * ordering reasons: if the registers are on the next stack, we don't + * want to print them out yet. Otherwise they'll be shown as part of + * the wrong stack. Later, when show_trace_log_lvl() switches to the + * next stack, this function will be called again with the same regs so + * they can be printed in the right context. + */ + if (!partial && on_stack(info, regs, sizeof(*regs))) { __show_regs(regs, 0); - else if (on_stack(info, (void *)regs + IRET_FRAME_OFFSET, - IRET_FRAME_SIZE)) { + + } else if (partial && on_stack(info, (void *)regs + IRET_FRAME_OFFSET, + IRET_FRAME_SIZE)) { /* * When an interrupt or exception occurs in entry code, the * full pt_regs might not have been saved yet. In that case @@ -98,6 +109,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, struct stack_info stack_info = {0}; unsigned long visit_mask = 0; int graph_idx = 0; + bool partial; printk("%sCall Trace:\n", log_lvl); @@ -140,7 +152,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, printk("%s <%s>\n", log_lvl, stack_name); if (regs) - show_regs_safe(&stack_info, regs); + show_regs_if_on_stack(&stack_info, regs, partial); /* * Scan the stack, printing any text addresses we find. At the @@ -164,7 +176,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, /* * Don't print regs->ip again if it was already printed - * by show_regs_safe() below. + * by show_regs_if_on_stack(). */ if (regs && stack == ®s->ip) goto next; @@ -199,9 +211,9 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, unwind_next_frame(&state); /* if the frame has entry regs, print them */ - regs = unwind_get_entry_regs(&state); + regs = unwind_get_entry_regs(&state, &partial); if (regs) - show_regs_safe(&stack_info, regs); + show_regs_if_on_stack(&stack_info, regs, partial); } if (stack_name) diff --git a/arch/x86/kernel/stacktrace.c b/arch/x86/kernel/stacktrace.c index 8dabd7bf1673..60244bfaf88f 100644 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -98,7 +98,7 @@ static int __save_stack_trace_reliable(struct stack_trace *trace, for (unwind_start(&state, task, NULL, NULL); !unwind_done(&state); unwind_next_frame(&state)) { - regs = unwind_get_entry_regs(&state); + regs = unwind_get_entry_regs(&state, NULL); if (regs) { /* * Kernel mode registers on the stack indicate an From 89c96233a9ed8ee76d6e8e6a13ea19b92971d33f Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Sun, 31 Dec 2017 10:18:07 -0600 Subject: [PATCH 0958/1013] x86/dumpstack: Print registers for first stack frame MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3ffdeb1a02be3086f1411a15c5b9c481fa28e21f upstream. In the stack dump code, if the frame after the starting pt_regs is also a regs frame, the registers don't get printed. Fix that. Reported-by: Andy Lutomirski Tested-by: Alexander Tsoy Signed-off-by: Josh Poimboeuf Cc: Andy Lutomirski Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Toralf Förster Fixes: 3b3fa11bc700 ("x86/dumpstack: Print any pt_regs found on the stack") Link: http://lkml.kernel.org/r/396f84491d2f0ef64eda4217a2165f5712f6a115.1514736742.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/dumpstack.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index d0bb176a7261..afbecff161d1 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -115,6 +115,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, unwind_start(&state, task, regs, stack); stack = stack ? : get_stack_pointer(task, regs); + regs = unwind_get_entry_regs(&state, &partial); /* * Iterate through the stacks, starting with the current stack pointer. @@ -132,7 +133,7 @@ void show_trace_log_lvl(struct task_struct *task, struct pt_regs *regs, * - hardirq stack * - entry stack */ - for (regs = NULL; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { + for ( ; stack; stack = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { const char *stack_name; if (get_stack_info(stack, task, &stack_info, &visit_mask)) { From 7930cb29fb5feb6d18ffc20a83a1d3e5afac7a8a Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 3 Jan 2018 19:52:04 +0100 Subject: [PATCH 0959/1013] x86/pti: Switch to kernel CR3 at early in entry_SYSCALL_compat() commit d7732ba55c4b6a2da339bb12589c515830cfac2c upstream. The preparation for PTI which added CR3 switching to the entry code misplaced the CR3 switch in entry_SYSCALL_compat(). With PTI enabled the entry code tries to access a per cpu variable after switching to kernel GS. This fails because that variable is not mapped to user space. This results in a double fault and in the worst case a kernel crash. Move the switch ahead of the access and clobber RSP which has been saved already. Fixes: 8a09317b895f ("x86/mm/pti: Prepare the x86/entry assembly code for entry/exit CR3 switching") Reported-by: Lars Wendler Reported-by: Laura Abbott Signed-off-by: Thomas Gleixner Cc: Borislav Betkov Cc: Andy Lutomirski , Cc: Dave Hansen , Cc: Peter Zijlstra , Cc: Greg KH , , Cc: Boris Ostrovsky , Cc: Juergen Gross Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801031949200.1957@nanos Signed-off-by: Greg Kroah-Hartman --- arch/x86/entry/entry_64_compat.S | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S index 40f17009ec20..98d5358e4041 100644 --- a/arch/x86/entry/entry_64_compat.S +++ b/arch/x86/entry/entry_64_compat.S @@ -190,8 +190,13 @@ ENTRY(entry_SYSCALL_compat) /* Interrupts are off on entry. */ swapgs - /* Stash user ESP and switch to the kernel stack. */ + /* Stash user ESP */ movl %esp, %r8d + + /* Use %rsp as scratch reg. User ESP is stashed in r8 */ + SWITCH_TO_KERNEL_CR3 scratch_reg=%rsp + + /* Switch to the kernel stack */ movq PER_CPU_VAR(cpu_current_top_of_stack), %rsp /* Construct struct pt_regs on stack */ @@ -219,12 +224,6 @@ GLOBAL(entry_SYSCALL_compat_after_hwframe) pushq $0 /* pt_regs->r14 = 0 */ pushq $0 /* pt_regs->r15 = 0 */ - /* - * We just saved %rdi so it is safe to clobber. It is not - * preserved during the C calls inside TRACE_IRQS_OFF anyway. - */ - SWITCH_TO_KERNEL_CR3 scratch_reg=%rdi - /* * User mode is traced as though IRQs are on, and SYSENTER * turned them off. From 88dff1ab04d2792492e3f7e166d039f0dc4e1a09 Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Wed, 3 Jan 2018 12:39:52 -0800 Subject: [PATCH 0960/1013] x86/process: Define cpu_tss_rw in same section as declaration commit 2fd9c41aea47f4ad071accf94b94f94f2c4d31eb upstream. cpu_tss_rw is declared with DECLARE_PER_CPU_PAGE_ALIGNED but then defined with DEFINE_PER_CPU_SHARED_ALIGNED leading to section mismatch warnings. Use DEFINE_PER_CPU_PAGE_ALIGNED consistently. This is necessary because it's mapped to the cpu entry area and must be page aligned. [ tglx: Massaged changelog a bit ] Fixes: 1a935bc3d4ea ("x86/entry: Move SYSENTER_stack to the beginning of struct tss_struct") Suggested-by: Thomas Gleixner Signed-off-by: Nick Desaulniers Signed-off-by: Thomas Gleixner Cc: thomas.lendacky@amd.com Cc: Borislav Petkov Cc: tklauser@distanz.ch Cc: minipli@googlemail.com Cc: me@kylehuey.com Cc: namit@vmware.com Cc: luto@kernel.org Cc: jpoimboe@redhat.com Cc: tj@kernel.org Cc: cl@linux.com Cc: bp@suse.de Cc: thgarnie@google.com Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/20180103203954.183360-1-ndesaulniers@google.com Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/process.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 517415978409..3cb2486c47e4 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -47,7 +47,7 @@ * section. Since TSS's are completely CPU-local, we want them * on exact cacheline boundaries, to eliminate cacheline ping-pong. */ -__visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, cpu_tss_rw) = { +__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = { .x86_tss = { /* * .sp0 is only used when entering ring 0 from a lower From 2d01ac8cc12b973668bf898b03bf9ffb12d83b83 Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Wed, 15 Nov 2017 06:40:57 +0100 Subject: [PATCH 0961/1013] Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find." commit 94802151894d482e82c324edf2c658f8e6b96508 upstream. This reverts commit c9f3f813d462c72dbe412cee6a5cbacf13c4ad5e. This commit breaks transport mode when the policy template has widlcard addresses configured, so revert it. Signed-off-by: Steffen Klassert Cc: From: Derek Robson Signed-off-by: Greg Kroah-Hartman --- net/xfrm/xfrm_policy.c | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 2a6093840e7e..6bc16bb61b55 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -1362,29 +1362,36 @@ xfrm_tmpl_resolve_one(struct xfrm_policy *policy, const struct flowi *fl, struct net *net = xp_net(policy); int nx; int i, error; + xfrm_address_t *daddr = xfrm_flowi_daddr(fl, family); + xfrm_address_t *saddr = xfrm_flowi_saddr(fl, family); xfrm_address_t tmp; for (nx = 0, i = 0; i < policy->xfrm_nr; i++) { struct xfrm_state *x; - xfrm_address_t *local; - xfrm_address_t *remote; + xfrm_address_t *remote = daddr; + xfrm_address_t *local = saddr; struct xfrm_tmpl *tmpl = &policy->xfrm_vec[i]; - remote = &tmpl->id.daddr; - local = &tmpl->saddr; - if (xfrm_addr_any(local, tmpl->encap_family)) { - error = xfrm_get_saddr(net, fl->flowi_oif, - &tmp, remote, - tmpl->encap_family, 0); - if (error) - goto fail; - local = &tmp; + if (tmpl->mode == XFRM_MODE_TUNNEL || + tmpl->mode == XFRM_MODE_BEET) { + remote = &tmpl->id.daddr; + local = &tmpl->saddr; + if (xfrm_addr_any(local, tmpl->encap_family)) { + error = xfrm_get_saddr(net, fl->flowi_oif, + &tmp, remote, + tmpl->encap_family, 0); + if (error) + goto fail; + local = &tmp; + } } x = xfrm_state_find(remote, local, fl, tmpl, policy, &error, family); if (x && x->km.state == XFRM_STATE_VALID) { xfrm[nx++] = x; + daddr = remote; + saddr = local; continue; } if (x) { From 56073ba4e7068a737a225631b5b642a2cd105b7a Mon Sep 17 00:00:00 2001 From: Troy Kisky Date: Thu, 2 Nov 2017 18:58:12 -0700 Subject: [PATCH 0962/1013] rtc: m41t80: m41t80_sqw_set_rate should return 0 on success commit de6042d2fa8afe22b76e3c68fd6e9584c9415a3b upstream. Previously it was returning -EINVAL upon success. Signed-off-by: Troy Kisky Signed-off-by: Alexandre Belloni Cc: Christoph Fritz Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-m41t80.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/rtc/rtc-m41t80.c b/drivers/rtc/rtc-m41t80.c index f4c070ea8384..8f5843169dc2 100644 --- a/drivers/rtc/rtc-m41t80.c +++ b/drivers/rtc/rtc-m41t80.c @@ -510,10 +510,7 @@ static int m41t80_sqw_set_rate(struct clk_hw *hw, unsigned long rate, reg = (reg & 0x0f) | (val << 4); ret = i2c_smbus_write_byte_data(client, reg_sqw, reg); - if (ret < 0) - return ret; - - return -EINVAL; + return ret; } static int m41t80_sqw_control(struct clk_hw *hw, bool enable) From 8932255e96ecd172783c2c5a3a414066bda844cf Mon Sep 17 00:00:00 2001 From: Troy Kisky Date: Thu, 2 Nov 2017 18:58:13 -0700 Subject: [PATCH 0963/1013] rtc: m41t80: fix m41t80_sqw_round_rate return value commit c8384bb04261b9d32fe7402a6068ddaf38913b23 upstream. Previously it was returning the best of 32768, 8192, 1024, 64, 2, 0 Now, best of 32768, 8192, 4096, 2048, 1024, 512, 256, 128, 64, 32, 16, 8, 4, 2, 1, 0 Signed-off-by: Troy Kisky Signed-off-by: Alexandre Belloni Cc: Christoph Fritz Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-m41t80.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/drivers/rtc/rtc-m41t80.c b/drivers/rtc/rtc-m41t80.c index 8f5843169dc2..42fc735a5446 100644 --- a/drivers/rtc/rtc-m41t80.c +++ b/drivers/rtc/rtc-m41t80.c @@ -468,18 +468,13 @@ static unsigned long m41t80_sqw_recalc_rate(struct clk_hw *hw, static long m41t80_sqw_round_rate(struct clk_hw *hw, unsigned long rate, unsigned long *prate) { - int i, freq = M41T80_SQW_MAX_FREQ; - - if (freq <= rate) - return freq; - - for (i = 2; i <= ilog2(M41T80_SQW_MAX_FREQ); i++) { - freq /= 1 << i; - if (freq <= rate) - return freq; - } - - return 0; + if (rate >= M41T80_SQW_MAX_FREQ) + return M41T80_SQW_MAX_FREQ; + if (rate >= M41T80_SQW_MAX_FREQ / 4) + return M41T80_SQW_MAX_FREQ / 4; + if (!rate) + return 0; + return 1 << ilog2(rate); } static int m41t80_sqw_set_rate(struct clk_hw *hw, unsigned long rate, From 11118640ae1705c57699a394d0754d4c699772c6 Mon Sep 17 00:00:00 2001 From: Troy Kisky Date: Thu, 2 Nov 2017 18:58:14 -0700 Subject: [PATCH 0964/1013] rtc: m41t80: avoid i2c read in m41t80_sqw_recalc_rate commit 2cb90ed3de1e279dbaf23df141f54eb9fb1861e6 upstream. This is a little more efficient, and avoids the warning WARNING: possible circular locking dependency detected 4.14.0-rc7-00007 #14 Not tainted ------------------------------------------------------ alsactl/330 is trying to acquire lock: (prepare_lock){+.+.}, at: [] clk_prepare_lock+0x80/0xf4 but task is already holding lock: (i2c_register_adapter){+.+.}, at: [] i2c_adapter_lock_bus+0x14/0x18 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (i2c_register_adapter){+.+.}: rt_mutex_lock+0x44/0x5c i2c_adapter_lock_bus+0x14/0x18 i2c_transfer+0xa8/0xbc i2c_smbus_xfer+0x20c/0x5d8 i2c_smbus_read_byte_data+0x38/0x48 m41t80_sqw_recalc_rate+0x24/0x58 Signed-off-by: Troy Kisky Signed-off-by: Alexandre Belloni Cc: Christoph Fritz Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-m41t80.c | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) diff --git a/drivers/rtc/rtc-m41t80.c b/drivers/rtc/rtc-m41t80.c index 42fc735a5446..f44dcf628c87 100644 --- a/drivers/rtc/rtc-m41t80.c +++ b/drivers/rtc/rtc-m41t80.c @@ -154,6 +154,7 @@ struct m41t80_data { struct rtc_device *rtc; #ifdef CONFIG_COMMON_CLK struct clk_hw sqw; + unsigned long freq; #endif }; @@ -443,26 +444,28 @@ static SIMPLE_DEV_PM_OPS(m41t80_pm, m41t80_suspend, m41t80_resume); #ifdef CONFIG_COMMON_CLK #define sqw_to_m41t80_data(_hw) container_of(_hw, struct m41t80_data, sqw) -static unsigned long m41t80_sqw_recalc_rate(struct clk_hw *hw, - unsigned long parent_rate) +static unsigned long m41t80_decode_freq(int setting) +{ + return (setting == 0) ? 0 : (setting == 1) ? M41T80_SQW_MAX_FREQ : + M41T80_SQW_MAX_FREQ >> setting; +} + +static unsigned long m41t80_get_freq(struct m41t80_data *m41t80) { - struct m41t80_data *m41t80 = sqw_to_m41t80_data(hw); struct i2c_client *client = m41t80->client; int reg_sqw = (m41t80->features & M41T80_FEATURE_SQ_ALT) ? M41T80_REG_WDAY : M41T80_REG_SQW; int ret = i2c_smbus_read_byte_data(client, reg_sqw); - unsigned long val = M41T80_SQW_MAX_FREQ; if (ret < 0) return 0; + return m41t80_decode_freq(ret >> 4); +} - ret >>= 4; - if (ret == 0) - val = 0; - else if (ret > 1) - val = val / (1 << ret); - - return val; +static unsigned long m41t80_sqw_recalc_rate(struct clk_hw *hw, + unsigned long parent_rate) +{ + return sqw_to_m41t80_data(hw)->freq; } static long m41t80_sqw_round_rate(struct clk_hw *hw, unsigned long rate, @@ -505,6 +508,8 @@ static int m41t80_sqw_set_rate(struct clk_hw *hw, unsigned long rate, reg = (reg & 0x0f) | (val << 4); ret = i2c_smbus_write_byte_data(client, reg_sqw, reg); + if (!ret) + m41t80->freq = m41t80_decode_freq(val); return ret; } @@ -579,6 +584,7 @@ static struct clk *m41t80_sqw_register_clk(struct m41t80_data *m41t80) init.parent_names = NULL; init.num_parents = 0; m41t80->sqw.init = &init; + m41t80->freq = m41t80_get_freq(m41t80); /* optional override of the clockname */ of_property_read_string(node, "clock-output-names", &init.name); From 7623bd95a6dccf74e30a21e437f2f789fbbe27f7 Mon Sep 17 00:00:00 2001 From: Troy Kisky Date: Thu, 2 Nov 2017 18:58:15 -0700 Subject: [PATCH 0965/1013] rtc: m41t80: avoid i2c read in m41t80_sqw_is_prepared commit 13bb1d78f2e372ec0d9b30489ac63768240140fc upstream. This is a little more efficient and avoids the warning WARNING: possible circular locking dependency detected 4.14.0-rc7-00010 #16 Not tainted ------------------------------------------------------ kworker/2:1/70 is trying to acquire lock: (prepare_lock){+.+.}, at: [] clk_prepare_lock+0x80/0xf4 but task is already holding lock: (i2c_register_adapter){+.+.}, at: [] i2c_adapter_lock_bus+0x14/0x18 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (i2c_register_adapter){+.+.}: rt_mutex_lock+0x44/0x5c i2c_adapter_lock_bus+0x14/0x18 i2c_transfer+0xa8/0xbc i2c_smbus_xfer+0x20c/0x5d8 i2c_smbus_read_byte_data+0x38/0x48 m41t80_sqw_is_prepared+0x18/0x28 Signed-off-by: Troy Kisky Signed-off-by: Alexandre Belloni Cc: Christoph Fritz Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-m41t80.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/drivers/rtc/rtc-m41t80.c b/drivers/rtc/rtc-m41t80.c index f44dcf628c87..96a606d5f6e6 100644 --- a/drivers/rtc/rtc-m41t80.c +++ b/drivers/rtc/rtc-m41t80.c @@ -155,6 +155,7 @@ struct m41t80_data { #ifdef CONFIG_COMMON_CLK struct clk_hw sqw; unsigned long freq; + unsigned int sqwe; #endif }; @@ -527,7 +528,10 @@ static int m41t80_sqw_control(struct clk_hw *hw, bool enable) else ret &= ~M41T80_ALMON_SQWE; - return i2c_smbus_write_byte_data(client, M41T80_REG_ALARM_MON, ret); + ret = i2c_smbus_write_byte_data(client, M41T80_REG_ALARM_MON, ret); + if (!ret) + m41t80->sqwe = enable; + return ret; } static int m41t80_sqw_prepare(struct clk_hw *hw) @@ -542,14 +546,7 @@ static void m41t80_sqw_unprepare(struct clk_hw *hw) static int m41t80_sqw_is_prepared(struct clk_hw *hw) { - struct m41t80_data *m41t80 = sqw_to_m41t80_data(hw); - struct i2c_client *client = m41t80->client; - int ret = i2c_smbus_read_byte_data(client, M41T80_REG_ALARM_MON); - - if (ret < 0) - return ret; - - return !!(ret & M41T80_ALMON_SQWE); + return sqw_to_m41t80_data(hw)->sqwe; } static const struct clk_ops m41t80_sqw_ops = { From 566fb9906ee25a697f4b694dd624398712adf00b Mon Sep 17 00:00:00 2001 From: Troy Kisky Date: Thu, 2 Nov 2017 18:58:16 -0700 Subject: [PATCH 0966/1013] rtc: m41t80: remove unneeded checks from m41t80_sqw_set_rate commit 05a03bf260e0480bfc0db91b1fdbc2115e3f193b upstream. m41t80_sqw_set_rate will be called with the result from m41t80_sqw_round_rate, so might as well make m41t80_sqw_set_rate(n) same as m41t80_sqw_set_rate(m41t80_sqw_round_rate(n)) As Russell King wrote[1], "clk_round_rate() is supposed to tell you what you end up with if you ask clk_set_rate() to set the exact same value you passed in - but clk_round_rate() won't modify the hardware." [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-January/080175.html Signed-off-by: Troy Kisky Signed-off-by: Alexandre Belloni Cc: Christoph Fritz Signed-off-by: Greg Kroah-Hartman --- drivers/rtc/rtc-m41t80.c | 17 ++++++----------- 1 file changed, 6 insertions(+), 11 deletions(-) diff --git a/drivers/rtc/rtc-m41t80.c b/drivers/rtc/rtc-m41t80.c index 96a606d5f6e6..c90fba3ed861 100644 --- a/drivers/rtc/rtc-m41t80.c +++ b/drivers/rtc/rtc-m41t80.c @@ -490,17 +490,12 @@ static int m41t80_sqw_set_rate(struct clk_hw *hw, unsigned long rate, M41T80_REG_WDAY : M41T80_REG_SQW; int reg, ret, val = 0; - if (rate) { - if (!is_power_of_2(rate)) - return -EINVAL; - val = ilog2(rate); - if (val == ilog2(M41T80_SQW_MAX_FREQ)) - val = 1; - else if (val < (ilog2(M41T80_SQW_MAX_FREQ) - 1)) - val = ilog2(M41T80_SQW_MAX_FREQ) - val; - else - return -EINVAL; - } + if (rate >= M41T80_SQW_MAX_FREQ) + val = 1; + else if (rate >= M41T80_SQW_MAX_FREQ / 4) + val = 2; + else if (rate) + val = 15 - ilog2(rate); reg = i2c_smbus_read_byte_data(client, reg_sqw); if (reg < 0) From 8d577afdee3540808302d9dc7a0a7be96c91178f Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 5 Jan 2018 15:48:59 +0100 Subject: [PATCH 0967/1013] Linux 4.14.12 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 655887067dc7..20f7d4de0f1c 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 11 +SUBLEVEL = 12 EXTRAVERSION = NAME = Petit Gorille From 7adf28df2f0bdeb2141f6e8edc2b746d4a6cc11f Mon Sep 17 00:00:00 2001 From: Andrey Ryabinin Date: Thu, 28 Dec 2017 19:06:20 +0300 Subject: [PATCH 0968/1013] x86/mm: Set MODULES_END to 0xffffffffff000000 commit f5a40711fa58f1c109165a4fec6078bf2dfd2bdc upstream. Since f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") kasan_mem_to_shadow(MODULES_END) could be not aligned to a page boundary. So passing page unaligned address to kasan_populate_zero_shadow() have two possible effects: 1) It may leave one page hole in supposed to be populated area. After commit 21506525fb8d ("x86/kasan/64: Teach KASAN about the cpu_entry_area") that hole happens to be in the shadow covering fixmap area and leads to crash: BUG: unable to handle kernel paging request at fffffbffffe8ee04 RIP: 0010:check_memory_region+0x5c/0x190 Call Trace: memcpy+0x1f/0x50 ghes_copy_tofrom_phys+0xab/0x180 ghes_read_estatus+0xfb/0x280 ghes_notify_nmi+0x2b2/0x410 nmi_handle+0x115/0x2c0 default_do_nmi+0x57/0x110 do_nmi+0xf8/0x150 end_repeat_nmi+0x1a/0x1e Note, the crash likely disappeared after commit 92a0f81d8957, which changed kasan_populate_zero_shadow() call the way it was before commit 21506525fb8d. 2) Attempt to load module near MODULES_END will fail, because __vmalloc_node_range() called from kasan_module_alloc() will hit the WARN_ON(!pte_none(*pte)) in the vmap_pte_range() and bail out with error. To fix this we need to make kasan_mem_to_shadow(MODULES_END) page aligned which means that MODULES_END should be 8*PAGE_SIZE aligned. The whole point of commit f06bdd4001c2 was to move MODULES_END down if NR_CPUS is big, so the cpu_entry_area takes a lot of space. But since 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") the cpu_entry_area is no longer in fixmap, so we could just set MODULES_END to a fixed 8*PAGE_SIZE aligned address. Fixes: f06bdd4001c2 ("x86/mm: Adapt MODULES_END based on fixmap section size") Reported-by: Jakub Kicinski Signed-off-by: Andrey Ryabinin Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Thomas Garnier Link: https://lkml.kernel.org/r/20171228160620.23818-1-aryabinin@virtuozzo.com Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 5 +---- arch/x86/include/asm/pgtable_64_types.h | 2 +- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index ad41b3813f0a..ddd5ffd31bd0 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -43,7 +43,7 @@ ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space ... unused hole ... ffffffff80000000 - ffffffff9fffffff (=512 MB) kernel text mapping, from phys 0 -ffffffffa0000000 - [fixmap start] (~1526 MB) module mapping space +ffffffffa0000000 - fffffffffeffffff (1520 MB) module mapping space [fixmap start] - ffffffffff5fffff kernel-internal fixmap range ffffffffff600000 - ffffffffff600fff (=4 kB) legacy vsyscall ABI ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole @@ -67,9 +67,6 @@ memory window (this size is arbitrary, it can be raised later if needed). The mappings are not part of any other kernel PGD and are only available during EFI runtime calls. -The module mapping space size changes based on the CONFIG requirements for the -following fixmap section. - Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all physical memory, vmalloc/ioremap space and virtual memory map are randomized. Their order is preserved but their base will be offset early at boot time. diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index b97a539bcdee..6233e5595389 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -104,7 +104,7 @@ typedef struct { pteval_t pte; } pte_t; #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) /* The module sections ends with the start of the fixmap */ -#define MODULES_END __fix_to_virt(__end_of_fixed_addresses + 1) +#define MODULES_END _AC(0xffffffffff000000, UL) #define MODULES_LEN (MODULES_END - MODULES_VADDR) #define ESPFIX_PGD_ENTRY _AC(-2, UL) From 1af9b74bfa59250f69578f6e6424aae7cb46d063 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 4 Jan 2018 13:01:40 +0100 Subject: [PATCH 0969/1013] x86/mm: Map cpu_entry_area at the same place on 4/5 level commit f2078904810373211fb15f91888fba14c01a4acc upstream. There is no reason for 4 and 5 level pagetables to have a different layout. It just makes determining vaddr_end for KASLR harder than necessary. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Signed-off-by: Thomas Gleixner Cc: Andy Lutomirski Cc: Benjamin Gilbert Cc: Greg Kroah-Hartman Cc: Dave Hansen Cc: Peter Zijlstra Cc: Thomas Garnier Cc: Alexander Kuleshov Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 7 ++++--- arch/x86/include/asm/pgtable_64_types.h | 4 ++-- arch/x86/mm/dump_pagetables.c | 2 +- 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index ddd5ffd31bd0..f7dabe1f01e9 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -12,8 +12,8 @@ ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) ... unused hole ... ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB) ... unused hole ... -fffffe0000000000 - fffffe7fffffffff (=39 bits) LDT remap for PTI -fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping +fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping +fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space @@ -37,7 +37,8 @@ ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) ... unused hole ... ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB) ... unused hole ... -fffffe8000000000 - fffffeffffffffff (=39 bits) cpu_entry_area mapping +fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping +... unused hole ... ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks ... unused hole ... ffffffef00000000 - fffffffeffffffff (=64 GB) EFI region mapping space diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 6233e5595389..61b4b60bdc13 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -88,7 +88,7 @@ typedef struct { pteval_t pte; } pte_t; # define VMALLOC_SIZE_TB _AC(32, UL) # define __VMALLOC_BASE _AC(0xffffc90000000000, UL) # define __VMEMMAP_BASE _AC(0xffffea0000000000, UL) -# define LDT_PGD_ENTRY _AC(-4, UL) +# define LDT_PGD_ENTRY _AC(-3, UL) # define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) #endif @@ -110,7 +110,7 @@ typedef struct { pteval_t pte; } pte_t; #define ESPFIX_PGD_ENTRY _AC(-2, UL) #define ESPFIX_BASE_ADDR (ESPFIX_PGD_ENTRY << P4D_SHIFT) -#define CPU_ENTRY_AREA_PGD _AC(-3, UL) +#define CPU_ENTRY_AREA_PGD _AC(-4, UL) #define CPU_ENTRY_AREA_BASE (CPU_ENTRY_AREA_PGD << P4D_SHIFT) #define EFI_VA_START ( -4 * (_AC(1, UL) << 30)) diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index f56902c1f04b..2a4849e92831 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -61,10 +61,10 @@ enum address_markers_idx { KASAN_SHADOW_START_NR, KASAN_SHADOW_END_NR, #endif + CPU_ENTRY_AREA_NR, #if defined(CONFIG_MODIFY_LDT_SYSCALL) && !defined(CONFIG_X86_5LEVEL) LDT_NR, #endif - CPU_ENTRY_AREA_NR, #ifdef CONFIG_X86_ESPFIX64 ESPFIX_START_NR, #endif From 67f67244f80a37fba03c9e6011853dba94952c7c Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 4 Jan 2018 12:32:03 +0100 Subject: [PATCH 0970/1013] x86/kaslr: Fix the vaddr_end mess commit 1dddd25125112ba49706518ac9077a1026a18f37 upstream. vaddr_end for KASLR is only documented in the KASLR code itself and is adjusted depending on config options. So it's not surprising that a change of the memory layout causes KASLR to have the wrong vaddr_end. This can map arbitrary stuff into other areas causing hard to understand problems. Remove the whole ifdef magic and define the start of the cpu_entry_area to be the end of the KASLR vaddr range. Add documentation to that effect. Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap") Reported-by: Benjamin Gilbert Signed-off-by: Thomas Gleixner Tested-by: Benjamin Gilbert Cc: Andy Lutomirski Cc: Greg Kroah-Hartman Cc: Dave Hansen Cc: Peter Zijlstra Cc: Thomas Garnier Cc: Alexander Kuleshov Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801041320360.1771@nanos Signed-off-by: Greg Kroah-Hartman --- Documentation/x86/x86_64/mm.txt | 6 +++++ arch/x86/include/asm/pgtable_64_types.h | 8 ++++++- arch/x86/mm/kaslr.c | 32 +++++++------------------ 3 files changed, 22 insertions(+), 24 deletions(-) diff --git a/Documentation/x86/x86_64/mm.txt b/Documentation/x86/x86_64/mm.txt index f7dabe1f01e9..ea91cb61a602 100644 --- a/Documentation/x86/x86_64/mm.txt +++ b/Documentation/x86/x86_64/mm.txt @@ -12,6 +12,7 @@ ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB) ... unused hole ... ffffec0000000000 - fffffbffffffffff (=44 bits) kasan shadow memory (16TB) ... unused hole ... + vaddr_end for KASLR fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping fffffe8000000000 - fffffeffffffffff (=39 bits) LDT remap for PTI ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks @@ -37,6 +38,7 @@ ffd4000000000000 - ffd5ffffffffffff (=49 bits) virtual memory map (512TB) ... unused hole ... ffdf000000000000 - fffffc0000000000 (=53 bits) kasan shadow memory (8PB) ... unused hole ... + vaddr_end for KASLR fffffe0000000000 - fffffe7fffffffff (=39 bits) cpu_entry_area mapping ... unused hole ... ffffff0000000000 - ffffff7fffffffff (=39 bits) %esp fixup stacks @@ -71,3 +73,7 @@ during EFI runtime calls. Note that if CONFIG_RANDOMIZE_MEMORY is enabled, the direct mapping of all physical memory, vmalloc/ioremap space and virtual memory map are randomized. Their order is preserved but their base will be offset early at boot time. + +Be very careful vs. KASLR when changing anything here. The KASLR address +range must not overlap with anything except the KASAN shadow area, which is +correct as KASAN disables KASLR. diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 61b4b60bdc13..6b8f73dcbc2c 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -75,7 +75,13 @@ typedef struct { pteval_t pte; } pte_t; #define PGDIR_SIZE (_AC(1, UL) << PGDIR_SHIFT) #define PGDIR_MASK (~(PGDIR_SIZE - 1)) -/* See Documentation/x86/x86_64/mm.txt for a description of the memory map. */ +/* + * See Documentation/x86/x86_64/mm.txt for a description of the memory map. + * + * Be very careful vs. KASLR when changing anything here. The KASLR address + * range must not overlap with anything except the KASAN shadow area, which + * is correct as KASAN disables KASLR. + */ #define MAXMEM _AC(__AC(1, UL) << MAX_PHYSMEM_BITS, UL) #ifdef CONFIG_X86_5LEVEL diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 879ef930e2c2..aedebd2ebf1e 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -34,25 +34,14 @@ #define TB_SHIFT 40 /* - * Virtual address start and end range for randomization. The end changes base - * on configuration to have the highest amount of space for randomization. - * It increases the possible random position for each randomized region. + * Virtual address start and end range for randomization. * - * You need to add an if/def entry if you introduce a new memory region - * compatible with KASLR. Your entry must be in logical order with memory - * layout. For example, ESPFIX is before EFI because its virtual address is - * before. You also need to add a BUILD_BUG_ON() in kernel_randomize_memory() to - * ensure that this order is correct and won't be changed. + * The end address could depend on more configuration options to make the + * highest amount of space for randomization available, but that's too hard + * to keep straight and caused issues already. */ static const unsigned long vaddr_start = __PAGE_OFFSET_BASE; - -#if defined(CONFIG_X86_ESPFIX64) -static const unsigned long vaddr_end = ESPFIX_BASE_ADDR; -#elif defined(CONFIG_EFI) -static const unsigned long vaddr_end = EFI_VA_END; -#else -static const unsigned long vaddr_end = __START_KERNEL_map; -#endif +static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE; /* Default values */ unsigned long page_offset_base = __PAGE_OFFSET_BASE; @@ -101,15 +90,12 @@ void __init kernel_randomize_memory(void) unsigned long remain_entropy; /* - * All these BUILD_BUG_ON checks ensures the memory layout is - * consistent with the vaddr_start/vaddr_end variables. + * These BUILD_BUG_ON checks ensure the memory layout is consistent + * with the vaddr_start/vaddr_end variables. These checks are very + * limited.... */ BUILD_BUG_ON(vaddr_start >= vaddr_end); - BUILD_BUG_ON(IS_ENABLED(CONFIG_X86_ESPFIX64) && - vaddr_end >= EFI_VA_END); - BUILD_BUG_ON((IS_ENABLED(CONFIG_X86_ESPFIX64) || - IS_ENABLED(CONFIG_EFI)) && - vaddr_end >= __START_KERNEL_map); + BUILD_BUG_ON(vaddr_end != CPU_ENTRY_AREA_BASE); BUILD_BUG_ON(vaddr_end > __START_KERNEL_map); if (!kaslr_memory_enabled()) From b4660939afccbe1d2af5d066d1337f1b29c87ae8 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 4 Jan 2018 18:07:12 +0100 Subject: [PATCH 0971/1013] x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers commit 42f3bdc5dd962a5958bc024c1e1444248a6b8b4a upstream. Thomas reported the following warning: BUG: using smp_processor_id() in preemptible [00000000] code: ovsdb-server/4498 caller is native_flush_tlb_single+0x57/0xc0 native_flush_tlb_single+0x57/0xc0 __set_pte_vaddr+0x2d/0x40 set_pte_vaddr+0x2f/0x40 cea_set_pte+0x30/0x40 ds_update_cea.constprop.4+0x4d/0x70 reserve_ds_buffers+0x159/0x410 x86_reserve_hardware+0x150/0x160 x86_pmu_event_init+0x3e/0x1f0 perf_try_init_event+0x69/0x80 perf_event_alloc+0x652/0x740 SyS_perf_event_open+0x3f6/0xd60 do_syscall_64+0x5c/0x190 set_pte_vaddr is used to map the ds buffers into the cpu entry area, but there are two problems with that: 1) The resulting flush is not supposed to be called in preemptible context 2) The cpu entry area is supposed to be per CPU, but the debug store buffers are mapped for all CPUs so these mappings need to be flushed globally. Add the necessary preemption protection across the mapping code and flush TLBs globally. Fixes: c1961a4631da ("x86/events/intel/ds: Map debug buffers in cpu_entry_area") Reported-by: Thomas Zeitlhofer Signed-off-by: Peter Zijlstra Signed-off-by: Thomas Gleixner Tested-by: Thomas Zeitlhofer Cc: Greg Kroah-Hartman Cc: Hugh Dickins Link: https://lkml.kernel.org/r/20180104170712.GB3040@hirez.programming.kicks-ass.net Signed-off-by: Greg Kroah-Hartman --- arch/x86/events/intel/ds.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c index 8f0aace08b87..8156e47da7ba 100644 --- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -5,6 +5,7 @@ #include #include +#include #include #include "../perf_event.h" @@ -283,20 +284,35 @@ static DEFINE_PER_CPU(void *, insn_buffer); static void ds_update_cea(void *cea, void *addr, size_t size, pgprot_t prot) { + unsigned long start = (unsigned long)cea; phys_addr_t pa; size_t msz = 0; pa = virt_to_phys(addr); + + preempt_disable(); for (; msz < size; msz += PAGE_SIZE, pa += PAGE_SIZE, cea += PAGE_SIZE) cea_set_pte(cea, pa, prot); + + /* + * This is a cross-CPU update of the cpu_entry_area, we must shoot down + * all TLB entries for it. + */ + flush_tlb_kernel_range(start, start + size); + preempt_enable(); } static void ds_clear_cea(void *cea, size_t size) { + unsigned long start = (unsigned long)cea; size_t msz = 0; + preempt_disable(); for (; msz < size; msz += PAGE_SIZE, cea += PAGE_SIZE) cea_set_pte(cea, 0, PAGE_NONE); + + flush_tlb_kernel_range(start, start + size); + preempt_enable(); } static void *dsalloc_pages(size_t size, gfp_t flags, int cpu) From 98f42e3f849412895171d096b901bfa3ff0fe004 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 4 Jan 2018 22:19:04 +0100 Subject: [PATCH 0972/1013] x86/tlb: Drop the _GPL from the cpu_tlbstate export commit 1e5476815fd7f98b888e01a0f9522b63085f96c9 upstream. The recent changes for PTI touch cpu_tlbstate from various tlb_flush inlines. cpu_tlbstate is exported as GPL symbol, so this causes a regression when building out of tree drivers for certain graphics cards. Aside of that the export was wrong since it was introduced as it should have been EXPORT_PER_CPU_SYMBOL_GPL(). Use the correct PER_CPU export and drop the _GPL to restore the previous state which allows users to utilize the cards they payed for. As always I'm really thrilled to make this kind of change to support the #friends (or however the hot hashtag of today is spelled) from that closet sauce graphics corp. Fixes: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: Kees Cook Signed-off-by: Thomas Gleixner Cc: Greg Kroah-Hartman Cc: Peter Zijlstra Cc: Andy Lutomirski Signed-off-by: Greg Kroah-Hartman --- arch/x86/mm/init.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 80259ad8c386..6b462a472a7b 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -870,7 +870,7 @@ __visible DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = { .next_asid = 1, .cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */ }; -EXPORT_SYMBOL_GPL(cpu_tlbstate); +EXPORT_PER_CPU_SYMBOL(cpu_tlbstate); void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache) { From 83c7e4977221da397ac37119c50a1b21cc6a6677 Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Thu, 4 Jan 2018 14:37:05 +0000 Subject: [PATCH 0973/1013] x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm commit b9e705ef7cfaf22db0daab91ad3cd33b0fa32eb9 upstream. Where an ALTERNATIVE is used in the middle of an inline asm block, this would otherwise lead to the following instruction being appended directly to the trailing ".popsection", and a failed compile. Fixes: 9cebed423c84 ("x86, alternative: Use .pushsection/.popsection") Signed-off-by: David Woodhouse Signed-off-by: Thomas Gleixner Cc: gnomes@lxorguk.ukuu.org.uk Cc: Rik van Riel Cc: ak@linux.intel.com Cc: Tim Chen Cc: Peter Zijlstra Cc: Paul Turner Cc: Jiri Kosina Cc: Andy Lutomirski Cc: Dave Hansen Cc: Kees Cook Cc: Linus Torvalds Cc: Greg Kroah-Hartman Link: https://lkml.kernel.org/r/20180104143710.8961-8-dwmw@amazon.co.uk Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/alternative.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index dbfd0854651f..cf5961ca8677 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -140,7 +140,7 @@ static inline int alternatives_text_reserved(void *start, void *end) ".popsection\n" \ ".pushsection .altinstr_replacement, \"ax\"\n" \ ALTINSTR_REPLACEMENT(newinstr, feature, 1) \ - ".popsection" + ".popsection\n" #define ALTERNATIVE_2(oldinstr, newinstr1, feature1, newinstr2, feature2)\ OLDINSTR_2(oldinstr, 1, 2) \ @@ -151,7 +151,7 @@ static inline int alternatives_text_reserved(void *start, void *end) ".pushsection .altinstr_replacement, \"ax\"\n" \ ALTINSTR_REPLACEMENT(newinstr1, feature1, 1) \ ALTINSTR_REPLACEMENT(newinstr2, feature2, 2) \ - ".popsection" + ".popsection\n" /* * Alternative instructions for different CPU types or capabilities. From aea4ead5aba7d0990e007d55b806b1e7859a8bed Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 5 Jan 2018 15:27:34 +0100 Subject: [PATCH 0974/1013] x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN commit de791821c295cc61419a06fe5562288417d1bc58 upstream. Use the name associated with the particular attack which needs page table isolation for mitigation. Signed-off-by: Thomas Gleixner Acked-by: David Woodhouse Cc: Alan Cox Cc: Jiri Koshina Cc: Linus Torvalds Cc: Tim Chen Cc: Andi Lutomirski Cc: Andi Kleen Cc: Peter Zijlstra Cc: Paul Turner Cc: Tom Lendacky Cc: Greg KH Cc: Dave Hansen Cc: Kees Cook Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos Signed-off-by: Greg Kroah-Hartman --- arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/kernel/cpu/common.c | 2 +- arch/x86/mm/pti.c | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 07cdd1715705..21ac898df2d8 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -341,6 +341,6 @@ #define X86_BUG_SWAPGS_FENCE X86_BUG(11) /* SWAPGS without input dep on GS */ #define X86_BUG_MONITOR X86_BUG(12) /* IPI required to wake up remote CPU */ #define X86_BUG_AMD_E400 X86_BUG(13) /* CPU is among the affected by Erratum 400 */ -#define X86_BUG_CPU_INSECURE X86_BUG(14) /* CPU is insecure and needs kernel page table isolation */ +#define X86_BUG_CPU_MELTDOWN X86_BUG(14) /* CPU is affected by meltdown attack and needs kernel page table isolation */ #endif /* _ASM_X86_CPUFEATURES_H */ diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index b1be494ab4e8..2d3bd2215e5b 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -900,7 +900,7 @@ static void __init early_identify_cpu(struct cpuinfo_x86 *c) setup_force_cpu_cap(X86_FEATURE_ALWAYS); if (c->x86_vendor != X86_VENDOR_AMD) - setup_force_cpu_bug(X86_BUG_CPU_INSECURE); + setup_force_cpu_bug(X86_BUG_CPU_MELTDOWN); fpu__init_system(c); diff --git a/arch/x86/mm/pti.c b/arch/x86/mm/pti.c index 2da28ba97508..43d4a4a29037 100644 --- a/arch/x86/mm/pti.c +++ b/arch/x86/mm/pti.c @@ -56,13 +56,13 @@ static void __init pti_print_if_insecure(const char *reason) { - if (boot_cpu_has_bug(X86_BUG_CPU_INSECURE)) + if (boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) pr_info("%s\n", reason); } static void __init pti_print_if_secure(const char *reason) { - if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE)) + if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) pr_info("%s\n", reason); } @@ -96,7 +96,7 @@ void __init pti_check_boottime_disable(void) } autosel: - if (!boot_cpu_has_bug(X86_BUG_CPU_INSECURE)) + if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN)) return; enable: setup_force_cpu_cap(X86_FEATURE_PTI); From ae04ca35247af576999da5ef726d1a03fc65de09 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Thu, 4 Jan 2018 16:17:49 -0800 Subject: [PATCH 0975/1013] kernel/acct.c: fix the acct->needcheck check in check_free_space() commit 4d9570158b6260f449e317a5f9ed030c2504a615 upstream. As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check is very wrong, we need time_is_after_jiffies() to make sys_acct() work. Ignoring the overflows, the code should "goto out" if needcheck > jiffies, while currently it checks "needcheck < jiffies" and thus in the likely case check_free_space() does nothing until jiffies overflow. In particular this means that sys_acct() is simply broken, acct_on() sets acct->needcheck = jiffies and expects that check_free_space() should set acct->active = 1 after the free-space check, but this won't happen if jiffies increments in between. This was broken by commit 32dc73086015 ("get rid of timer in kern/acct.c") in 2011, then another (correct) commit 795a2f22a8ea ("acct() should honour the limits from the very beginning") made the problem more visible. Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com Fixes: 32dc73086015 ("get rid of timer in kern/acct.c") Reported-by: TSUKADA Koutaro Suggested-by: TSUKADA Koutaro Signed-off-by: Oleg Nesterov Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/acct.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/acct.c b/kernel/acct.c index 6670fbd3e466..354578d253d5 100644 --- a/kernel/acct.c +++ b/kernel/acct.c @@ -102,7 +102,7 @@ static int check_free_space(struct bsd_acct_struct *acct) { struct kstatfs sbuf; - if (time_is_before_jiffies(acct->needcheck)) + if (time_is_after_jiffies(acct->needcheck)) goto out; /* May block */ From abcc78627877cef68144475e88952b0957406024 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Thu, 4 Jan 2018 16:17:52 -0800 Subject: [PATCH 0976/1013] mm/mprotect: add a cond_resched() inside change_pmd_range() commit 4991c09c7c812dba13ea9be79a68b4565bb1fa4e upstream. While testing on a large CPU system, detected the following RCU stall many times over the span of the workload. This problem is solved by adding a cond_resched() in the change_pmd_range() function. INFO: rcu_sched detected stalls on CPUs/tasks: 154-....: (670 ticks this GP) idle=022/140000000000000/0 softirq=2825/2825 fqs=612 (detected by 955, t=6002 jiffies, g=4486, c=4485, q=90864) Sending NMI from CPU 955 to CPUs 154: NMI backtrace for cpu 154 CPU: 154 PID: 147071 Comm: workload Not tainted 4.15.0-rc3+ #3 NIP: c0000000000b3f64 LR: c0000000000b33d4 CTR: 000000000000aa18 REGS: 00000000a4b0fb44 TRAP: 0501 Not tainted (4.15.0-rc3+) MSR: 8000000000009033 CR: 22422082 XER: 00000000 CFAR: 00000000006cf8f0 SOFTE: 1 GPR00: 0010000000000000 c00003ef9b1cb8c0 c0000000010cc600 0000000000000000 GPR04: 8e0000018c32b200 40017b3858fd6e00 8e0000018c32b208 40017b3858fd6e00 GPR08: 8e0000018c32b210 40017b3858fd6e00 8e0000018c32b218 40017b3858fd6e00 GPR12: ffffffffffffffff c00000000fb25100 NIP [c0000000000b3f64] plpar_hcall9+0x44/0x7c LR [c0000000000b33d4] pSeries_lpar_flush_hash_range+0x384/0x420 Call Trace: flush_hash_range+0x48/0x100 __flush_tlb_pending+0x44/0xd0 hpte_need_flush+0x408/0x470 change_protection_range+0xaac/0xf10 change_prot_numa+0x30/0xb0 task_numa_work+0x2d0/0x3e0 task_work_run+0x130/0x190 do_notify_resume+0x118/0x120 ret_from_except_lite+0x70/0x74 Instruction dump: 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008 Link: http://lkml.kernel.org/r/20171214140551.5794-1-khandual@linux.vnet.ibm.com Signed-off-by: Anshuman Khandual Suggested-by: Nicholas Piggin Acked-by: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/mprotect.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index ec39f730a0bf..58b629bb70de 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -166,7 +166,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, next = pmd_addr_end(addr, end); if (!is_swap_pmd(*pmd) && !pmd_trans_huge(*pmd) && !pmd_devmap(*pmd) && pmd_none_or_clear_bad(pmd)) - continue; + goto next; /* invoke the mmu notifier if the pmd is populated */ if (!mni_start) { @@ -188,7 +188,7 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, } /* huge pmd was handled */ - continue; + goto next; } } /* fall through, the trans huge pmd just split */ @@ -196,6 +196,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, this_pages = change_pte_range(vma, pmd, addr, next, newprot, dirty_accountable, prot_numa); pages += this_pages; +next: + cond_resched(); } while (pmd++, addr = next, addr != end); if (mni_start) From c86ee796feed73dfa8a895be48c146b018d14393 Mon Sep 17 00:00:00 2001 From: Baoquan He Date: Thu, 4 Jan 2018 16:18:06 -0800 Subject: [PATCH 0977/1013] mm/sparse.c: wrong allocation for mem_section commit d09cfbbfa0f761a97687828b5afb27b56cbf2e19 upstream. In commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to save memory. It allocates the first dimension of array with sizeof(struct mem_section). It costs extra memory, should be sizeof(struct mem_section *). Fix it. Link: http://lkml.kernel.org/r/1513932498-20350-1-git-send-email-bhe@redhat.com Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Signed-off-by: Baoquan He Tested-by: Dave Young Acked-by: Kirill A. Shutemov Cc: Kirill A. Shutemov Cc: Ingo Molnar Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Atsushi Kumagai Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- mm/sparse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/sparse.c b/mm/sparse.c index 60805abf98af..30e56a100ee8 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -211,7 +211,7 @@ void __init memory_present(int nid, unsigned long start, unsigned long end) if (unlikely(!mem_section)) { unsigned long size, align; - size = sizeof(struct mem_section) * NR_SECTION_ROOTS; + size = sizeof(struct mem_section*) * NR_SECTION_ROOTS; align = 1 << (INTERNODE_CACHE_SHIFT); mem_section = memblock_virt_alloc(size, align); } From 319122a71f306943a75217b3e78a9a0f8e9c1e95 Mon Sep 17 00:00:00 2001 From: Andrea Arcangeli Date: Thu, 4 Jan 2018 16:18:09 -0800 Subject: [PATCH 0978/1013] userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails commit 0cbb4b4f4c44f54af268969b18d8deda63aded59 upstream. The previous fix in commit 384632e67e08 ("userfaultfd: non-cooperative: fix fork use after free") corrected the refcounting in case of UFFD_EVENT_FORK failure for the fork userfault paths. That still didn't clear the vma->vm_userfaultfd_ctx of the vmas that were set to point to the aborted new uffd ctx earlier in dup_userfaultfd. Link: http://lkml.kernel.org/r/20171223002505.593-2-aarcange@redhat.com Signed-off-by: Andrea Arcangeli Reported-by: syzbot Reviewed-by: Mike Rapoport Cc: Eric Biggers Cc: Dmitry Vyukov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- fs/userfaultfd.c | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 1c713fd5b3e6..5aa392eae1c3 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -570,11 +570,14 @@ int handle_userfault(struct vm_fault *vmf, unsigned long reason) static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, struct userfaultfd_wait_queue *ewq) { + struct userfaultfd_ctx *release_new_ctx; + if (WARN_ON_ONCE(current->flags & PF_EXITING)) goto out; ewq->ctx = ctx; init_waitqueue_entry(&ewq->wq, current); + release_new_ctx = NULL; spin_lock(&ctx->event_wqh.lock); /* @@ -601,8 +604,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, new = (struct userfaultfd_ctx *) (unsigned long) ewq->msg.arg.reserved.reserved1; - - userfaultfd_ctx_put(new); + release_new_ctx = new; } break; } @@ -617,6 +619,20 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, __set_current_state(TASK_RUNNING); spin_unlock(&ctx->event_wqh.lock); + if (release_new_ctx) { + struct vm_area_struct *vma; + struct mm_struct *mm = release_new_ctx->mm; + + /* the various vma->vm_userfaultfd_ctx still points to it */ + down_write(&mm->mmap_sem); + for (vma = mm->mmap; vma; vma = vma->vm_next) + if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) + vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; + up_write(&mm->mmap_sem); + + userfaultfd_ctx_put(release_new_ctx); + } + /* * ctx may go away after this if the userfault pseudo fd is * already released. From b9fa21f3daa6533ab2303caad72c996e3924cf3b Mon Sep 17 00:00:00 2001 From: Chris Mason Date: Fri, 15 Dec 2017 11:58:27 -0800 Subject: [PATCH 0979/1013] btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes commit ec35e48b286959991cdbb886f1bdeda4575c80b4 upstream. refcounts have a generic implementation and an asm optimized one. The generic version has extra debugging to make sure that once a refcount goes to zero, refcount_inc won't increase it. The btrfs delayed inode code wasn't expecting this, and we're tripping over the warnings when the generic refcounts are used. We ended up with this race: Process A Process B btrfs_get_delayed_node() spin_lock(root->inode_lock) radix_tree_lookup() __btrfs_release_delayed_node() refcount_dec_and_test(&delayed_node->refs) our refcount is now zero refcount_add(2) <--- warning here, refcount unchanged spin_lock(root->inode_lock) radix_tree_delete() With the generic refcounts, we actually warn again when process B above tries to release his refcount because refcount_add() turned into a no-op. We saw this in production on older kernels without the asm optimized refcounts. The fix used here is to use refcount_inc_not_zero() to detect when the object is in the middle of being freed and return NULL. This is almost always the right answer anyway, since we usually end up pitching the delayed_node if it didn't have fresh data in it. This also changes __btrfs_release_delayed_node() to remove the extra check for zero refcounts before radix tree deletion. btrfs_get_delayed_node() was the only path that was allowing refcounts to go from zero to one. Fixes: 6de5f18e7b0da ("btrfs: fix refcount_t usage when deleting btrfs_delayed_node") Signed-off-by: Chris Mason Reviewed-by: Liu Bo Signed-off-by: David Sterba Signed-off-by: Greg Kroah-Hartman --- fs/btrfs/delayed-inode.c | 45 ++++++++++++++++++++++++++++++---------- 1 file changed, 34 insertions(+), 11 deletions(-) diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c index 19e4ad2f3f2e..0c4b690cf761 100644 --- a/fs/btrfs/delayed-inode.c +++ b/fs/btrfs/delayed-inode.c @@ -87,6 +87,7 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node( spin_lock(&root->inode_lock); node = radix_tree_lookup(&root->delayed_nodes_tree, ino); + if (node) { if (btrfs_inode->delayed_node) { refcount_inc(&node->refs); /* can be accessed */ @@ -94,9 +95,30 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node( spin_unlock(&root->inode_lock); return node; } - btrfs_inode->delayed_node = node; - /* can be accessed and cached in the inode */ - refcount_add(2, &node->refs); + + /* + * It's possible that we're racing into the middle of removing + * this node from the radix tree. In this case, the refcount + * was zero and it should never go back to one. Just return + * NULL like it was never in the radix at all; our release + * function is in the process of removing it. + * + * Some implementations of refcount_inc refuse to bump the + * refcount once it has hit zero. If we don't do this dance + * here, refcount_inc() may decide to just WARN_ONCE() instead + * of actually bumping the refcount. + * + * If this node is properly in the radix, we want to bump the + * refcount twice, once for the inode and once for this get + * operation. + */ + if (refcount_inc_not_zero(&node->refs)) { + refcount_inc(&node->refs); + btrfs_inode->delayed_node = node; + } else { + node = NULL; + } + spin_unlock(&root->inode_lock); return node; } @@ -254,17 +276,18 @@ static void __btrfs_release_delayed_node( mutex_unlock(&delayed_node->mutex); if (refcount_dec_and_test(&delayed_node->refs)) { - bool free = false; struct btrfs_root *root = delayed_node->root; + spin_lock(&root->inode_lock); - if (refcount_read(&delayed_node->refs) == 0) { - radix_tree_delete(&root->delayed_nodes_tree, - delayed_node->inode_id); - free = true; - } + /* + * Once our refcount goes to zero, nobody is allowed to bump it + * back up. We can delete it now. + */ + ASSERT(refcount_read(&delayed_node->refs) == 0); + radix_tree_delete(&root->delayed_nodes_tree, + delayed_node->inode_id); spin_unlock(&root->inode_lock); - if (free) - kmem_cache_free(delayed_node_cache, delayed_node); + kmem_cache_free(delayed_node_cache, delayed_node); } } From 95a362c9a6892085f714eb6e31eea6a0e3aa93bf Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 2 Jan 2018 17:21:10 +0000 Subject: [PATCH 0980/1013] efi/capsule-loader: Reinstate virtual capsule mapping commit f24c4d478013d82bd1b943df566fff3561d52864 upstream. Commit: 82c3768b8d68 ("efi/capsule-loader: Use a cached copy of the capsule header") ... refactored the capsule loading code that maps the capsule header, to avoid having to map it several times. However, as it turns out, the vmap() call we ended up removing did not just map the header, but the entire capsule image, and dropping this virtual mapping breaks capsules that are processed by the firmware immediately (i.e., without a reboot). Unfortunately, that change was part of a larger refactor that allowed a quirk to be implemented for Quark, which has a non-standard memory layout for capsules, and we have slightly painted ourselves into a corner by allowing quirk code to mangle the capsule header and memory layout. So we need to fix this without breaking Quark. Fortunately, Quark does not appear to care about the virtual mapping, and so we can simply do a partial revert of commit: 2a457fb31df6 ("efi/capsule-loader: Use page addresses rather than struct page pointers") ... and create a vmap() mapping of the entire capsule (including header) based on the reinstated struct page array, unless running on Quark, in which case we pass the capsule header copy as before. Reported-by: Ge Song Tested-by: Bryan O'Donoghue Tested-by: Ge Song Signed-off-by: Ard Biesheuvel Cc: Dave Young Cc: Linus Torvalds Cc: Matt Fleming Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: linux-efi@vger.kernel.org Fixes: 82c3768b8d68 ("efi/capsule-loader: Use a cached copy of the capsule header") Link: http://lkml.kernel.org/r/20180102172110.17018-3-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/platform/efi/quirks.c | 13 +++++++- drivers/firmware/efi/capsule-loader.c | 45 ++++++++++++++++++++++----- include/linux/efi.h | 4 ++- 3 files changed, 52 insertions(+), 10 deletions(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index 8a99a2e96537..5b513ccffde4 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -592,7 +592,18 @@ static int qrk_capsule_setup_info(struct capsule_info *cap_info, void **pkbuff, /* * Update the first page pointer to skip over the CSH header. */ - cap_info->pages[0] += csh->headersize; + cap_info->phys[0] += csh->headersize; + + /* + * cap_info->capsule should point at a virtual mapping of the entire + * capsule, starting at the capsule header. Our image has the Quark + * security header prepended, so we cannot rely on the default vmap() + * mapping created by the generic capsule code. + * Given that the Quark firmware does not appear to care about the + * virtual mapping, let's just point cap_info->capsule at our copy + * of the capsule header. + */ + cap_info->capsule = &cap_info->header; return 1; } diff --git a/drivers/firmware/efi/capsule-loader.c b/drivers/firmware/efi/capsule-loader.c index ec8ac5c4dd84..055e2e8f985a 100644 --- a/drivers/firmware/efi/capsule-loader.c +++ b/drivers/firmware/efi/capsule-loader.c @@ -20,10 +20,6 @@ #define NO_FURTHER_WRITE_ACTION -1 -#ifndef phys_to_page -#define phys_to_page(x) pfn_to_page((x) >> PAGE_SHIFT) -#endif - /** * efi_free_all_buff_pages - free all previous allocated buffer pages * @cap_info: pointer to current instance of capsule_info structure @@ -35,7 +31,7 @@ static void efi_free_all_buff_pages(struct capsule_info *cap_info) { while (cap_info->index > 0) - __free_page(phys_to_page(cap_info->pages[--cap_info->index])); + __free_page(cap_info->pages[--cap_info->index]); cap_info->index = NO_FURTHER_WRITE_ACTION; } @@ -71,6 +67,14 @@ int __efi_capsule_setup_info(struct capsule_info *cap_info) cap_info->pages = temp_page; + temp_page = krealloc(cap_info->phys, + pages_needed * sizeof(phys_addr_t *), + GFP_KERNEL | __GFP_ZERO); + if (!temp_page) + return -ENOMEM; + + cap_info->phys = temp_page; + return 0; } @@ -105,9 +109,24 @@ int __weak efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, **/ static ssize_t efi_capsule_submit_update(struct capsule_info *cap_info) { + bool do_vunmap = false; int ret; - ret = efi_capsule_update(&cap_info->header, cap_info->pages); + /* + * cap_info->capsule may have been assigned already by a quirk + * handler, so only overwrite it if it is NULL + */ + if (!cap_info->capsule) { + cap_info->capsule = vmap(cap_info->pages, cap_info->index, + VM_MAP, PAGE_KERNEL); + if (!cap_info->capsule) + return -ENOMEM; + do_vunmap = true; + } + + ret = efi_capsule_update(cap_info->capsule, cap_info->phys); + if (do_vunmap) + vunmap(cap_info->capsule); if (ret) { pr_err("capsule update failed\n"); return ret; @@ -165,10 +184,12 @@ static ssize_t efi_capsule_write(struct file *file, const char __user *buff, goto failed; } - cap_info->pages[cap_info->index++] = page_to_phys(page); + cap_info->pages[cap_info->index] = page; + cap_info->phys[cap_info->index] = page_to_phys(page); cap_info->page_bytes_remain = PAGE_SIZE; + cap_info->index++; } else { - page = phys_to_page(cap_info->pages[cap_info->index - 1]); + page = cap_info->pages[cap_info->index - 1]; } kbuff = kmap(page); @@ -252,6 +273,7 @@ static int efi_capsule_release(struct inode *inode, struct file *file) struct capsule_info *cap_info = file->private_data; kfree(cap_info->pages); + kfree(cap_info->phys); kfree(file->private_data); file->private_data = NULL; return 0; @@ -281,6 +303,13 @@ static int efi_capsule_open(struct inode *inode, struct file *file) return -ENOMEM; } + cap_info->phys = kzalloc(sizeof(void *), GFP_KERNEL); + if (!cap_info->phys) { + kfree(cap_info->pages); + kfree(cap_info); + return -ENOMEM; + } + file->private_data = cap_info; return 0; diff --git a/include/linux/efi.h b/include/linux/efi.h index d813f7b04da7..29fdf8029cf6 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -140,11 +140,13 @@ struct efi_boot_memmap { struct capsule_info { efi_capsule_header_t header; + efi_capsule_header_t *capsule; int reset_type; long index; size_t count; size_t total_size; - phys_addr_t *pages; + struct page **pages; + phys_addr_t *phys; size_t page_bytes_remain; }; From 491062a0099c8bf6841dece8dd501d9fc2b80e34 Mon Sep 17 00:00:00 2001 From: Jan Engelhardt Date: Tue, 19 Dec 2017 19:09:07 +0100 Subject: [PATCH 0981/1013] crypto: n2 - cure use after free commit 203f45003a3d03eea8fa28d74cfc74c354416fdb upstream. queue_cache_init is first called for the Control Word Queue (n2_crypto_probe). At that time, queue_cache[0] is NULL and a new kmem_cache will be allocated. If the subsequent n2_register_algs call fails, the kmem_cache will be released in queue_cache_destroy, but queue_cache_init[0] is not set back to NULL. So when the Module Arithmetic Unit gets probed next (n2_mau_probe), queue_cache_init will not allocate a kmem_cache again, but leave it as its bogus value, causing a BUG() to trigger when queue_cache[0] is eventually passed to kmem_cache_zalloc: n2_crypto: Found N2CP at /virtual-devices@100/n2cp@7 n2_crypto: Registered NCS HVAPI version 2.0 called queue_cache_init n2_crypto: md5 alg registration failed n2cp f028687c: /virtual-devices@100/n2cp@7: Unable to register algorithms. called queue_cache_destroy n2cp: probe of f028687c failed with error -22 n2_crypto: Found NCP at /virtual-devices@100/ncp@6 n2_crypto: Registered NCS HVAPI version 2.0 called queue_cache_init kernel BUG at mm/slab.c:2993! Call Trace: [0000000000604488] kmem_cache_alloc+0x1a8/0x1e0 (inlined) kmem_cache_zalloc (inlined) new_queue (inlined) spu_queue_setup (inlined) handle_exec_unit [0000000010c61eb4] spu_mdesc_scan+0x1f4/0x460 [n2_crypto] [0000000010c62b80] n2_mau_probe+0x100/0x220 [n2_crypto] [000000000084b174] platform_drv_probe+0x34/0xc0 Signed-off-by: Jan Engelhardt Acked-by: David S. Miller Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/n2_core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/crypto/n2_core.c b/drivers/crypto/n2_core.c index a9fd8b9e86cd..699ee5a9a8f9 100644 --- a/drivers/crypto/n2_core.c +++ b/drivers/crypto/n2_core.c @@ -1625,6 +1625,7 @@ static int queue_cache_init(void) CWQ_ENTRY_SIZE, 0, NULL); if (!queue_cache[HV_NCS_QTYPE_CWQ - 1]) { kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]); + queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL; return -ENOMEM; } return 0; @@ -1634,6 +1635,8 @@ static void queue_cache_destroy(void) { kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_MAU - 1]); kmem_cache_destroy(queue_cache[HV_NCS_QTYPE_CWQ - 1]); + queue_cache[HV_NCS_QTYPE_MAU - 1] = NULL; + queue_cache[HV_NCS_QTYPE_CWQ - 1] = NULL; } static long spu_queue_register_workfn(void *arg) From 9c36498f74533e0b1fdcb9692a04482bc2d91d8d Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 11 Dec 2017 12:15:17 -0800 Subject: [PATCH 0982/1013] crypto: chacha20poly1305 - validate the digest size commit e57121d08c38dabec15cf3e1e2ad46721af30cae upstream. If the rfc7539 template was instantiated with a hash algorithm with digest size larger than 16 bytes (POLY1305_DIGEST_SIZE), then the digest overran the 'tag' buffer in 'struct chachapoly_req_ctx', corrupting the subsequent memory, including 'cryptlen'. This caused a crash during crypto_skcipher_decrypt(). Fix it by, when instantiating the template, requiring that the underlying hash algorithm has the digest size expected for Poly1305. Reproducer: #include #include #include int main() { int algfd, reqfd; struct sockaddr_alg addr = { .salg_type = "aead", .salg_name = "rfc7539(chacha20,sha256)", }; unsigned char buf[32] = { 0 }; algfd = socket(AF_ALG, SOCK_SEQPACKET, 0); bind(algfd, (void *)&addr, sizeof(addr)); setsockopt(algfd, SOL_ALG, ALG_SET_KEY, buf, sizeof(buf)); reqfd = accept(algfd, 0, 0); write(reqfd, buf, 16); read(reqfd, buf, 16); } Reported-by: syzbot Fixes: 71ebc4d1b27d ("crypto: chacha20poly1305 - Add a ChaCha20-Poly1305 AEAD construction, RFC7539") Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/chacha20poly1305.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/crypto/chacha20poly1305.c b/crypto/chacha20poly1305.c index db1bc3147bc4..600afa99941f 100644 --- a/crypto/chacha20poly1305.c +++ b/crypto/chacha20poly1305.c @@ -610,6 +610,11 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb, algt->mask)); if (IS_ERR(poly)) return PTR_ERR(poly); + poly_hash = __crypto_hash_alg_common(poly); + + err = -EINVAL; + if (poly_hash->digestsize != POLY1305_DIGEST_SIZE) + goto out_put_poly; err = -ENOMEM; inst = kzalloc(sizeof(*inst) + sizeof(*ctx), GFP_KERNEL); @@ -618,7 +623,6 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb, ctx = aead_instance_ctx(inst); ctx->saltlen = CHACHAPOLY_IV_SIZE - ivsize; - poly_hash = __crypto_hash_alg_common(poly); err = crypto_init_ahash_spawn(&ctx->poly, poly_hash, aead_crypto_instance(inst)); if (err) From 7156c794b8ab462705e6ac80c5fa69565eb44c62 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Wed, 20 Dec 2017 14:28:25 -0800 Subject: [PATCH 0983/1013] crypto: pcrypt - fix freeing pcrypt instances commit d76c68109f37cb85b243a1cf0f40313afd2bae68 upstream. pcrypt is using the old way of freeing instances, where the ->free() method specified in the 'struct crypto_template' is passed a pointer to the 'struct crypto_instance'. But the crypto_instance is being kfree()'d directly, which is incorrect because the memory was actually allocated as an aead_instance, which contains the crypto_instance at a nonzero offset. Thus, the wrong pointer was being kfree()'d. Fix it by switching to the new way to free aead_instance's where the ->free() method is specified in the aead_instance itself. Reported-by: syzbot Fixes: 0496f56065e0 ("crypto: pcrypt - Add support for new AEAD interface") Signed-off-by: Eric Biggers Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- crypto/pcrypt.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/crypto/pcrypt.c b/crypto/pcrypt.c index ee9cfb99fe25..f8ec3d4ba4a8 100644 --- a/crypto/pcrypt.c +++ b/crypto/pcrypt.c @@ -254,6 +254,14 @@ static void pcrypt_aead_exit_tfm(struct crypto_aead *tfm) crypto_free_aead(ctx->child); } +static void pcrypt_free(struct aead_instance *inst) +{ + struct pcrypt_instance_ctx *ctx = aead_instance_ctx(inst); + + crypto_drop_aead(&ctx->spawn); + kfree(inst); +} + static int pcrypt_init_instance(struct crypto_instance *inst, struct crypto_alg *alg) { @@ -319,6 +327,8 @@ static int pcrypt_create_aead(struct crypto_template *tmpl, struct rtattr **tb, inst->alg.encrypt = pcrypt_aead_encrypt; inst->alg.decrypt = pcrypt_aead_decrypt; + inst->free = pcrypt_free; + err = aead_register_instance(tmpl, inst); if (err) goto out_drop_aead; @@ -349,14 +359,6 @@ static int pcrypt_create(struct crypto_template *tmpl, struct rtattr **tb) return -EINVAL; } -static void pcrypt_free(struct crypto_instance *inst) -{ - struct pcrypt_instance_ctx *ctx = crypto_instance_ctx(inst); - - crypto_drop_aead(&ctx->spawn); - kfree(inst); -} - static int pcrypt_cpumask_change_notify(struct notifier_block *self, unsigned long val, void *data) { @@ -469,7 +471,6 @@ static void pcrypt_fini_padata(struct padata_pcrypt *pcrypt) static struct crypto_template pcrypt_tmpl = { .name = "pcrypt", .create = pcrypt_create, - .free = pcrypt_free, .module = THIS_MODULE, }; From d4f9aaeb26d024f2f239b19a68d3e339628e1066 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 5 Dec 2017 11:10:26 +0100 Subject: [PATCH 0984/1013] crypto: chelsio - select CRYPTO_GF128MUL commit d042566d8c704e1ecec370300545d4a409222e39 upstream. Without the gf128mul library support, we can run into a link error: drivers/crypto/chelsio/chcr_algo.o: In function `chcr_update_tweak': chcr_algo.c:(.text+0x7e0): undefined reference to `gf128mul_x8_ble' This adds a Kconfig select statement for it, next to the ones we already have. Fixes: b8fd1f4170e7 ("crypto: chcr - Add ctr mode and process large sg entries for cipher") Signed-off-by: Arnd Bergmann Signed-off-by: Herbert Xu Signed-off-by: Greg Kroah-Hartman --- drivers/crypto/chelsio/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/crypto/chelsio/Kconfig b/drivers/crypto/chelsio/Kconfig index 3e104f5aa0c2..b56b3f711d94 100644 --- a/drivers/crypto/chelsio/Kconfig +++ b/drivers/crypto/chelsio/Kconfig @@ -5,6 +5,7 @@ config CRYPTO_DEV_CHELSIO select CRYPTO_SHA256 select CRYPTO_SHA512 select CRYPTO_AUTHENC + select CRYPTO_GF128MUL ---help--- The Chelsio Crypto Co-processor driver for T6 adapters. From 5d8c39173bd71ed53ae88de77a8ec94f30b69919 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= Date: Fri, 8 Dec 2017 23:37:36 +0200 Subject: [PATCH 0985/1013] drm/i915: Disable DC states around GMBUS on GLK MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 3488d0237f6364614f0c59d6d784bb79b11eeb92 upstream. Prevent the DMC from destroying GMBUS transfers on GLK. GMBUS lives in PG1 so DC off is all we need. Signed-off-by: Ville Syrjälä Link: https://patchwork.freedesktop.org/patch/msgid/20171208213739.16388-1-ville.syrjala@linux.intel.com Reviewed-by: Dhinakaran Pandiyan (cherry picked from commit 156961ae7bdf6feb72778e8da83d321b273343fd) Signed-off-by: Jani Nikula Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/intel_runtime_pm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 49577eba8e7e..6e774b2673b4 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -1786,6 +1786,7 @@ void intel_display_power_put(struct drm_i915_private *dev_priv, GLK_DISPLAY_POWERWELL_2_POWER_DOMAINS | \ BIT_ULL(POWER_DOMAIN_MODESET) | \ BIT_ULL(POWER_DOMAIN_AUX_A) | \ + BIT_ULL(POWER_DOMAIN_GMBUS) | \ BIT_ULL(POWER_DOMAIN_INIT)) #define CNL_DISPLAY_POWERWELL_2_POWER_DOMAINS ( \ From 5d8c5beecdd9098458a047a871cd20fae3f4c6b2 Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Tue, 2 Jan 2018 12:18:37 -0800 Subject: [PATCH 0986/1013] drm/i915: Apply Display WA #1183 on skl, kbl, and cfl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 30414f3010aff95ffdb6bed7b9dce62cde94fdc7 upstream. Display WA #1183 was recently added to workaround "Failures when enabling DPLL0 with eDP link rate 2.16 or 4.32 GHz and CD clock frequency 308.57 or 617.14 MHz (CDCLK_CTL CD Frequency Select 10b or 11b) used in this enabling or in previous enabling." This workaround was designed to minimize the impact only to save the bad case with that link rates. But HW engineers indicated that it should be safe to apply broadly, although they were expecting the DPLL0 link rate to be unchanged on runtime. We need to cover 2 cases: when we are in fact enabling DPLL0 and when we are just changing the frequency with small differences. This is based on previous patch by Rodrigo Vivi with suggestions from Ville Syrjälä. Cc: Arthur J Runyan Cc: Ville Syrjälä Cc: Rodrigo Vivi Signed-off-by: Lucas De Marchi Reviewed-by: Ville Syrjälä Signed-off-by: Rodrigo Vivi Link: https://patchwork.freedesktop.org/patch/msgid/20171204232210.4958-1-lucas.demarchi@intel.com (cherry picked from commit 53421c2fe99ce16838639ad89d772d914a119a49) [ Lucas: Backport to 4.15 adding back variable that has been removed on commits not meant to be backported ] Signed-off-by: Jani Nikula Link: https://patchwork.freedesktop.org/patch/msgid/20180102201837.6812-1-lucas.demarchi@intel.com Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/i915/i915_reg.h | 2 ++ drivers/gpu/drm/i915/intel_cdclk.c | 35 ++++++++++++++++++------- drivers/gpu/drm/i915/intel_runtime_pm.c | 10 +++++++ 3 files changed, 38 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index c9bcc6c45012..ce2ed16f2a30 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -6944,6 +6944,7 @@ enum { #define RESET_PCH_HANDSHAKE_ENABLE (1<<4) #define GEN8_CHICKEN_DCPR_1 _MMIO(0x46430) +#define SKL_SELECT_ALTERNATE_DC_EXIT (1<<30) #define MASK_WAKEMEM (1<<13) #define SKL_DFSM _MMIO(0x51000) @@ -8475,6 +8476,7 @@ enum skl_power_gate { #define BXT_CDCLK_CD2X_DIV_SEL_2 (2<<22) #define BXT_CDCLK_CD2X_DIV_SEL_4 (3<<22) #define BXT_CDCLK_CD2X_PIPE(pipe) ((pipe)<<20) +#define CDCLK_DIVMUX_CD_OVERRIDE (1<<19) #define BXT_CDCLK_CD2X_PIPE_NONE BXT_CDCLK_CD2X_PIPE(3) #define BXT_CDCLK_SSA_PRECHARGE_ENABLE (1<<16) #define CDCLK_FREQ_DECIMAL_MASK (0x7ff) diff --git a/drivers/gpu/drm/i915/intel_cdclk.c b/drivers/gpu/drm/i915/intel_cdclk.c index 1241e5891b29..26a8dcd2c549 100644 --- a/drivers/gpu/drm/i915/intel_cdclk.c +++ b/drivers/gpu/drm/i915/intel_cdclk.c @@ -859,16 +859,10 @@ static void skl_set_preferred_cdclk_vco(struct drm_i915_private *dev_priv, static void skl_dpll0_enable(struct drm_i915_private *dev_priv, int vco) { - int min_cdclk = skl_calc_cdclk(0, vco); u32 val; WARN_ON(vco != 8100000 && vco != 8640000); - /* select the minimum CDCLK before enabling DPLL 0 */ - val = CDCLK_FREQ_337_308 | skl_cdclk_decimal(min_cdclk); - I915_WRITE(CDCLK_CTL, val); - POSTING_READ(CDCLK_CTL); - /* * We always enable DPLL0 with the lowest link rate possible, but still * taking into account the VCO required to operate the eDP panel at the @@ -922,7 +916,7 @@ static void skl_set_cdclk(struct drm_i915_private *dev_priv, { int cdclk = cdclk_state->cdclk; int vco = cdclk_state->vco; - u32 freq_select, pcu_ack; + u32 freq_select, pcu_ack, cdclk_ctl; int ret; WARN_ON((cdclk == 24000) != (vco == 0)); @@ -939,7 +933,7 @@ static void skl_set_cdclk(struct drm_i915_private *dev_priv, return; } - /* set CDCLK_CTL */ + /* Choose frequency for this cdclk */ switch (cdclk) { case 450000: case 432000: @@ -967,10 +961,33 @@ static void skl_set_cdclk(struct drm_i915_private *dev_priv, dev_priv->cdclk.hw.vco != vco) skl_dpll0_disable(dev_priv); + cdclk_ctl = I915_READ(CDCLK_CTL); + + if (dev_priv->cdclk.hw.vco != vco) { + /* Wa Display #1183: skl,kbl,cfl */ + cdclk_ctl &= ~(CDCLK_FREQ_SEL_MASK | CDCLK_FREQ_DECIMAL_MASK); + cdclk_ctl |= freq_select | skl_cdclk_decimal(cdclk); + I915_WRITE(CDCLK_CTL, cdclk_ctl); + } + + /* Wa Display #1183: skl,kbl,cfl */ + cdclk_ctl |= CDCLK_DIVMUX_CD_OVERRIDE; + I915_WRITE(CDCLK_CTL, cdclk_ctl); + POSTING_READ(CDCLK_CTL); + if (dev_priv->cdclk.hw.vco != vco) skl_dpll0_enable(dev_priv, vco); - I915_WRITE(CDCLK_CTL, freq_select | skl_cdclk_decimal(cdclk)); + /* Wa Display #1183: skl,kbl,cfl */ + cdclk_ctl &= ~(CDCLK_FREQ_SEL_MASK | CDCLK_FREQ_DECIMAL_MASK); + I915_WRITE(CDCLK_CTL, cdclk_ctl); + + cdclk_ctl |= freq_select | skl_cdclk_decimal(cdclk); + I915_WRITE(CDCLK_CTL, cdclk_ctl); + + /* Wa Display #1183: skl,kbl,cfl */ + cdclk_ctl &= ~CDCLK_DIVMUX_CD_OVERRIDE; + I915_WRITE(CDCLK_CTL, cdclk_ctl); POSTING_READ(CDCLK_CTL); /* inform PCU of the change */ diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index 6e774b2673b4..51cb5293bf43 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -598,6 +598,11 @@ void gen9_enable_dc5(struct drm_i915_private *dev_priv) DRM_DEBUG_KMS("Enabling DC5\n"); + /* Wa Display #1183: skl,kbl,cfl */ + if (IS_GEN9_BC(dev_priv)) + I915_WRITE(GEN8_CHICKEN_DCPR_1, I915_READ(GEN8_CHICKEN_DCPR_1) | + SKL_SELECT_ALTERNATE_DC_EXIT); + gen9_set_dc_state(dev_priv, DC_STATE_EN_UPTO_DC5); } @@ -625,6 +630,11 @@ void skl_disable_dc6(struct drm_i915_private *dev_priv) { DRM_DEBUG_KMS("Disabling DC6\n"); + /* Wa Display #1183: skl,kbl,cfl */ + if (IS_GEN9_BC(dev_priv)) + I915_WRITE(GEN8_CHICKEN_DCPR_1, I915_READ(GEN8_CHICKEN_DCPR_1) | + SKL_SELECT_ALTERNATE_DC_EXIT); + gen9_set_dc_state(dev_priv, DC_STATE_DISABLE); } From 2ee35943b11f07de678642a8f08e61e5d5f3867a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Stefan=20Br=C3=BCns?= Date: Mon, 27 Nov 2017 20:05:34 +0100 Subject: [PATCH 0987/1013] sunxi-rsb: Include OF based modalias in device uevent MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit e2bf801ecd4e62222a46d1ba9e57e710171d29c1 upstream. Include the OF-based modalias in the uevent sent when registering devices on the sunxi RSB bus, so that user space has a chance to autoload the kernel module for the device. Fixes a regression caused by commit 3f241bfa60bd ("arm64: allwinner: a64: pine64: Use dcdc1 regulator for mmc0"). When the axp20x-rsb module for the AXP803 PMIC is built as a module, it is not loaded and the system ends up with an disfunctional MMC controller. Fixes: d787dcdb9c8f ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus") Acked-by: Chen-Yu Tsai Signed-off-by: Stefan Brüns Signed-off-by: Maxime Ripard Signed-off-by: Greg Kroah-Hartman --- drivers/bus/sunxi-rsb.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/bus/sunxi-rsb.c b/drivers/bus/sunxi-rsb.c index 328ca93781cf..1b76d9585902 100644 --- a/drivers/bus/sunxi-rsb.c +++ b/drivers/bus/sunxi-rsb.c @@ -178,6 +178,7 @@ static struct bus_type sunxi_rsb_bus = { .match = sunxi_rsb_device_match, .probe = sunxi_rsb_device_probe, .remove = sunxi_rsb_device_remove, + .uevent = of_device_uevent_modalias, }; static void sunxi_rsb_dev_release(struct device *dev) From 2335a22b2224f71ac639becef4a6cb8ef54c8b7e Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 2 Jan 2018 10:02:19 +0000 Subject: [PATCH 0988/1013] fscache: Fix the default for fscache_maybe_release_page() commit 98801506552593c9b8ac11021b0cdad12cab4f6b upstream. Fix the default for fscache_maybe_release_page() for when the cookie isn't valid or the page isn't cached. It mustn't return false as that indicates the page cannot yet be freed. The problem with the default is that if, say, there's no cache, but a network filesystem's pages are using up almost all the available memory, a system can OOM because the filesystem ->releasepage() op will not allow them to be released as fscache_maybe_release_page() incorrectly prevents it. This can be tested by writing a sequence of 512MiB files to an AFS mount. It does not affect NFS or CIFS because both of those wrap the call in a check of PG_fscache and it shouldn't bother Ceph as that only has PG_private set whilst writeback is in progress. This might be an issue for 9P, however. Note that the pages aren't entirely stuck. Removing a file or unmounting will clear things because that uses ->invalidatepage() instead. Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions") Reported-by: Marc Dionne Signed-off-by: David Howells Reviewed-by: Jeff Layton Acked-by: Al Viro Tested-by: Marc Dionne Signed-off-by: Greg Kroah-Hartman --- include/linux/fscache.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/fscache.h b/include/linux/fscache.h index f4ff47d4a893..fe0c349684fa 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -755,7 +755,7 @@ bool fscache_maybe_release_page(struct fscache_cookie *cookie, { if (fscache_cookie_valid(cookie) && PageFsCache(page)) return __fscache_maybe_release_page(cookie, page, gfp); - return false; + return true; } /** From eff3e4fde76492931a45e23767c7842524b0ea06 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 13 Nov 2017 02:15:39 +0100 Subject: [PATCH 0989/1013] x86 / CPU: Avoid unnecessary IPIs in arch_freq_get_on_cpu() commit b29c6ef7bb1257853c1e31616d84f55e561cf631 upstream. Even though aperfmperf_snapshot_khz() caches the samples.khz value to return if called again in a sufficiently short time, its caller, arch_freq_get_on_cpu(), still uses smp_call_function_single() to run it which may allow user space to trigger an IPI storm by reading from the scaling_cur_freq cpufreq sysfs file in a tight loop. To avoid that, move the decision on whether or not to return the cached samples.khz value to arch_freq_get_on_cpu(). This change was part of commit 941f5f0f6ef5 ("x86: CPU: Fix up "cpu MHz" in /proc/cpuinfo"), but it was not the reason for the revert and it remains applicable. Fixes: 4815d3c56d1e (cpufreq: x86: Make scaling_cur_freq behave more as expected) Signed-off-by: Rafael J. Wysocki Reviewed-by: WANG Chao Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/aperfmperf.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c index 0ee83321a313..957813e0180d 100644 --- a/arch/x86/kernel/cpu/aperfmperf.c +++ b/arch/x86/kernel/cpu/aperfmperf.c @@ -42,10 +42,6 @@ static void aperfmperf_snapshot_khz(void *dummy) s64 time_delta = ktime_ms_delta(now, s->time); unsigned long flags; - /* Don't bother re-computing within the cache threshold time. */ - if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS) - return; - local_irq_save(flags); rdmsrl(MSR_IA32_APERF, aperf); rdmsrl(MSR_IA32_MPERF, mperf); @@ -74,6 +70,7 @@ static void aperfmperf_snapshot_khz(void *dummy) unsigned int arch_freq_get_on_cpu(int cpu) { + s64 time_delta; unsigned int khz; if (!cpu_khz) @@ -82,6 +79,12 @@ unsigned int arch_freq_get_on_cpu(int cpu) if (!static_cpu_has(X86_FEATURE_APERFMPERF)) return 0; + /* Don't bother re-computing within the cache threshold time. */ + time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu)); + khz = per_cpu(samples.khz, cpu); + if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS) + return khz; + smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1); khz = per_cpu(samples.khz, cpu); if (khz) From 22af48be826c4193bba2f11112330aabb2568594 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 15 Nov 2017 02:13:40 +0100 Subject: [PATCH 0990/1013] x86 / CPU: Always show current CPU frequency in /proc/cpuinfo commit 7d5905dc14a87805a59f3c5bf70173aac2bb18f8 upstream. After commit 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo on x86 can be either the nominal CPU frequency (which is constant) or the frequency most recently requested by a scaling governor in cpufreq, depending on the cpufreq configuration. That is somewhat inconsistent and is different from what it was before 4.13, so in order to restore the previous behavior, make it report the current CPU frequency like the scaling_cur_freq sysfs file in cpufreq. To that end, modify the /proc/cpuinfo implementation on x86 to use aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback registers, if available, and use their values to compute the CPU frequency to be reported as "cpu MHz". However, do that carefully enough to avoid accumulating delays that lead to unacceptable access times for /proc/cpuinfo on systems with many CPUs. Run aperfmperf_snapshot_khz() once on all CPUs asynchronously at the /proc/cpuinfo open time, add a single delay upfront (if necessary) at that point and simply compute the current frequency while running show_cpuinfo() for each individual CPU. Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce the default delay between consecutive APERF and MPERF reads to 10 ms, which should be sufficient to get large enough numbers for the frequency computation in all cases. Fixes: 890da9cf0983 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"") Signed-off-by: Rafael J. Wysocki Acked-by: Thomas Gleixner Tested-by: Thomas Gleixner Acked-by: Ingo Molnar Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/Makefile | 2 +- arch/x86/kernel/cpu/aperfmperf.c | 74 ++++++++++++++++++++++---------- arch/x86/kernel/cpu/cpu.h | 3 ++ arch/x86/kernel/cpu/proc.c | 6 ++- fs/proc/cpuinfo.c | 6 +++ include/linux/cpufreq.h | 1 + 6 files changed, 68 insertions(+), 24 deletions(-) diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile index 90cb82dbba57..570e8bb1f386 100644 --- a/arch/x86/kernel/cpu/Makefile +++ b/arch/x86/kernel/cpu/Makefile @@ -22,7 +22,7 @@ obj-y += common.o obj-y += rdrand.o obj-y += match.o obj-y += bugs.o -obj-$(CONFIG_CPU_FREQ) += aperfmperf.o +obj-y += aperfmperf.o obj-y += cpuid-deps.o obj-$(CONFIG_PROC_FS) += proc.o diff --git a/arch/x86/kernel/cpu/aperfmperf.c b/arch/x86/kernel/cpu/aperfmperf.c index 957813e0180d..7eba34df54c3 100644 --- a/arch/x86/kernel/cpu/aperfmperf.c +++ b/arch/x86/kernel/cpu/aperfmperf.c @@ -14,6 +14,8 @@ #include #include +#include "cpu.h" + struct aperfmperf_sample { unsigned int khz; ktime_t time; @@ -24,7 +26,7 @@ struct aperfmperf_sample { static DEFINE_PER_CPU(struct aperfmperf_sample, samples); #define APERFMPERF_CACHE_THRESHOLD_MS 10 -#define APERFMPERF_REFRESH_DELAY_MS 20 +#define APERFMPERF_REFRESH_DELAY_MS 10 #define APERFMPERF_STALE_THRESHOLD_MS 1000 /* @@ -38,8 +40,6 @@ static void aperfmperf_snapshot_khz(void *dummy) u64 aperf, aperf_delta; u64 mperf, mperf_delta; struct aperfmperf_sample *s = this_cpu_ptr(&samples); - ktime_t now = ktime_get(); - s64 time_delta = ktime_ms_delta(now, s->time); unsigned long flags; local_irq_save(flags); @@ -57,38 +57,68 @@ static void aperfmperf_snapshot_khz(void *dummy) if (mperf_delta == 0) return; - s->time = now; + s->time = ktime_get(); s->aperf = aperf; s->mperf = mperf; - - /* If the previous iteration was too long ago, discard it. */ - if (time_delta > APERFMPERF_STALE_THRESHOLD_MS) - s->khz = 0; - else - s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta); + s->khz = div64_u64((cpu_khz * aperf_delta), mperf_delta); } -unsigned int arch_freq_get_on_cpu(int cpu) +static bool aperfmperf_snapshot_cpu(int cpu, ktime_t now, bool wait) { - s64 time_delta; - unsigned int khz; + s64 time_delta = ktime_ms_delta(now, per_cpu(samples.time, cpu)); + + /* Don't bother re-computing within the cache threshold time. */ + if (time_delta < APERFMPERF_CACHE_THRESHOLD_MS) + return true; + + smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, wait); + + /* Return false if the previous iteration was too long ago. */ + return time_delta <= APERFMPERF_STALE_THRESHOLD_MS; +} +unsigned int aperfmperf_get_khz(int cpu) +{ if (!cpu_khz) return 0; if (!static_cpu_has(X86_FEATURE_APERFMPERF)) return 0; - /* Don't bother re-computing within the cache threshold time. */ - time_delta = ktime_ms_delta(ktime_get(), per_cpu(samples.time, cpu)); - khz = per_cpu(samples.khz, cpu); - if (khz && time_delta < APERFMPERF_CACHE_THRESHOLD_MS) - return khz; + aperfmperf_snapshot_cpu(cpu, ktime_get(), true); + return per_cpu(samples.khz, cpu); +} - smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1); - khz = per_cpu(samples.khz, cpu); - if (khz) - return khz; +void arch_freq_prepare_all(void) +{ + ktime_t now = ktime_get(); + bool wait = false; + int cpu; + + if (!cpu_khz) + return; + + if (!static_cpu_has(X86_FEATURE_APERFMPERF)) + return; + + for_each_online_cpu(cpu) + if (!aperfmperf_snapshot_cpu(cpu, now, false)) + wait = true; + + if (wait) + msleep(APERFMPERF_REFRESH_DELAY_MS); +} + +unsigned int arch_freq_get_on_cpu(int cpu) +{ + if (!cpu_khz) + return 0; + + if (!static_cpu_has(X86_FEATURE_APERFMPERF)) + return 0; + + if (aperfmperf_snapshot_cpu(cpu, ktime_get(), true)) + return per_cpu(samples.khz, cpu); msleep(APERFMPERF_REFRESH_DELAY_MS); smp_call_function_single(cpu, aperfmperf_snapshot_khz, NULL, 1); diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h index f52a370b6c00..e806b11a99af 100644 --- a/arch/x86/kernel/cpu/cpu.h +++ b/arch/x86/kernel/cpu/cpu.h @@ -47,4 +47,7 @@ extern const struct cpu_dev *const __x86_cpu_dev_start[], extern void get_cpu_cap(struct cpuinfo_x86 *c); extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c); + +unsigned int aperfmperf_get_khz(int cpu); + #endif /* ARCH_X86_CPU_H */ diff --git a/arch/x86/kernel/cpu/proc.c b/arch/x86/kernel/cpu/proc.c index 6b7e17bf0b71..e7ecedafa1c8 100644 --- a/arch/x86/kernel/cpu/proc.c +++ b/arch/x86/kernel/cpu/proc.c @@ -5,6 +5,8 @@ #include #include +#include "cpu.h" + /* * Get CPU information for use by the procfs. */ @@ -78,8 +80,10 @@ static int show_cpuinfo(struct seq_file *m, void *v) seq_printf(m, "microcode\t: 0x%x\n", c->microcode); if (cpu_has(c, X86_FEATURE_TSC)) { - unsigned int freq = cpufreq_quick_get(cpu); + unsigned int freq = aperfmperf_get_khz(cpu); + if (!freq) + freq = cpufreq_quick_get(cpu); if (!freq) freq = cpu_khz; seq_printf(m, "cpu MHz\t\t: %u.%03u\n", diff --git a/fs/proc/cpuinfo.c b/fs/proc/cpuinfo.c index e0f867cd8553..96f1087e372c 100644 --- a/fs/proc/cpuinfo.c +++ b/fs/proc/cpuinfo.c @@ -1,12 +1,18 @@ // SPDX-License-Identifier: GPL-2.0 +#include #include #include #include #include +__weak void arch_freq_prepare_all(void) +{ +} + extern const struct seq_operations cpuinfo_op; static int cpuinfo_open(struct inode *inode, struct file *file) { + arch_freq_prepare_all(); return seq_open(file, &cpuinfo_op); } diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 537ff842ff73..cbf85c4c745f 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -917,6 +917,7 @@ static inline bool policy_has_boost_freq(struct cpufreq_policy *policy) } #endif +extern void arch_freq_prepare_all(void); extern unsigned int arch_freq_get_on_cpu(int cpu); /* the following are really really optional */ From 2ebd69e8c251597f51768dce320cc10f9b4c8894 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Fri, 17 Nov 2017 15:30:01 -0800 Subject: [PATCH 0991/1013] kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream. The comment in sig_ignored() says "Tracers may want to know about even ignored signals" but SIGKILL can not be reported to debugger and it is just wrong to return 0 in this case: SIGKILL should only kill the SIGNAL_UNKILLABLE task if it comes from the parent ns. Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on sig_task_ignored(). SISGTOP coming from within the namespace is not really right too but at least debugger can intercept it, and we can't drop it here because this will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add another ->ptrace check later, we will see. Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com Signed-off-by: Oleg Nesterov Tested-by: Kyle Huey Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/signal.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 8dcd8825b2de..7339d7371885 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -94,13 +94,15 @@ static int sig_ignored(struct task_struct *t, int sig, bool force) if (sigismember(&t->blocked, sig) || sigismember(&t->real_blocked, sig)) return 0; - if (!sig_task_ignored(t, sig, force)) - return 0; - /* - * Tracers may want to know about even ignored signals. + * Tracers may want to know about even ignored signal unless it + * is SIGKILL which can't be reported anyway but can be ignored + * by SIGNAL_UNKILLABLE task. */ - return !t->ptrace; + if (t->ptrace && sig != SIGKILL) + return 0; + + return sig_task_ignored(t, sig, force); } /* From 94429f353675f4e3e5fefdaf3e01a18d44074f87 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Fri, 17 Nov 2017 15:30:04 -0800 Subject: [PATCH 0992/1013] kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals commit ac25385089f673560867eb5179228a44ade0cfc1 upstream. Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only() signals even if force == T. This simplifies the next change and this matches the same check in get_signal() which will drop these signals anyway. Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com Signed-off-by: Oleg Nesterov Tested-by: Kyle Huey Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/signal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/signal.c b/kernel/signal.c index 7339d7371885..8882fe994886 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -78,7 +78,7 @@ static int sig_task_ignored(struct task_struct *t, int sig, bool force) handler = sig_handler(t, sig); if (unlikely(t->signal->flags & SIGNAL_UNKILLABLE) && - handler == SIG_DFL && !force) + handler == SIG_DFL && !(force && sig_kernel_only(sig))) return 1; return sig_handler_ignored(handler, sig); From 289b51d3c17be8855c5b4be87d8e0e78cc70d511 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Fri, 17 Nov 2017 15:30:08 -0800 Subject: [PATCH 0993/1013] kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal() commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream. complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy the thread group, today this is wrong in many ways. If nothing else, fatal_signal_pending() should always imply that the whole thread group (except ->group_exit_task if it is not NULL) is killed, this check breaks the rule. After the previous changes we can rely on sig_task_ignored(); sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want to kill this task and sig == SIGKILL OR it is traced and debugger can intercept the signal. This should hopefully fix the problem reported by Dmitry. This test-case static int init(void *arg) { for (;;) pause(); } int main(void) { char stack[16 * 1024]; for (;;) { int pid = clone(init, stack + sizeof(stack)/2, CLONE_NEWPID | SIGCHLD, NULL); assert(pid > 0); assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0); assert(waitpid(-1, NULL, WSTOPPED) == pid); assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0); assert(syscall(__NR_tkill, pid, SIGKILL) == 0); assert(pid == wait(NULL)); } } triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in task_participate_group_stop(). do_signal_stop()->signal_group_exit() checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending() checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING. And his should fix the minor security problem reported by Kyle, SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the task is the root of a pid namespace. Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com Signed-off-by: Oleg Nesterov Reported-by: Dmitry Vyukov Reported-by: Kyle Huey Reviewed-by: Kees Cook Tested-by: Kyle Huey Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman --- kernel/signal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/signal.c b/kernel/signal.c index 8882fe994886..1facff1dbbae 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -931,9 +931,9 @@ static void complete_signal(int sig, struct task_struct *p, int group) * then start taking the whole group down immediately. */ if (sig_fatal(p, sig) && - !(signal->flags & (SIGNAL_UNKILLABLE | SIGNAL_GROUP_EXIT)) && + !(signal->flags & SIGNAL_GROUP_EXIT) && !sigismember(&t->real_blocked, sig) && - (sig == SIGKILL || !t->ptrace)) { + (sig == SIGKILL || !p->ptrace)) { /* * This signal will be fatal to the whole group. */ From 3ea9d6d0ad67c6a978604e5f16af5683c435e082 Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Thu, 14 Dec 2017 11:03:01 +0000 Subject: [PATCH 0994/1013] iommu/arm-smmu-v3: Don't free page table ops twice commit 57d72e159b60456c8bb281736c02ddd3164037aa upstream. Kasan reports a double free when finalise_stage_fn fails: the io_pgtable ops are freed by arm_smmu_domain_finalise and then again by arm_smmu_domain_free. Prevent this by leaving pgtbl_ops empty on failure. Fixes: 48ec83bcbcf5 ("iommu/arm-smmu: Add initial driver support for ARM SMMUv3 devices") Reviewed-by: Robin Murphy Signed-off-by: Jean-Philippe Brucker Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/arm-smmu-v3.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index e67ba6c40faf..1ff108794384 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1611,13 +1611,15 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain) domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; domain->geometry.aperture_end = (1UL << ias) - 1; domain->geometry.force_aperture = true; - smmu_domain->pgtbl_ops = pgtbl_ops; ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg); - if (ret < 0) + if (ret < 0) { free_io_pgtable_ops(pgtbl_ops); + return ret; + } - return ret; + smmu_domain->pgtbl_ops = pgtbl_ops; + return 0; } static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) From 986128f3a74b45c6d7e639c950b69cdc933d1686 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Tue, 2 Jan 2018 12:33:14 +0000 Subject: [PATCH 0995/1013] iommu/arm-smmu-v3: Cope with duplicated Stream IDs commit 563b5cbe334e9503ab2b234e279d500fc4f76018 upstream. For PCI devices behind an aliasing PCIe-to-PCI/X bridge, the bridge alias to DevFn 0.0 on the subordinate bus may match the original RID of the device, resulting in the same SID being present in the device's fwspec twice. This causes trouble later in arm_smmu_write_strtab_ent() when we wind up visiting the STE a second time and find it already live. Avoid the issue by giving arm_smmu_install_ste_for_dev() the cleverness to skip over duplicates. It seems mildly counterintuitive compared to preventing the duplicates from existing in the first place, but since the DT and ACPI probe paths build their fwspecs differently, this is actually the cleanest and most self-contained way to deal with it. Fixes: 8f78515425da ("iommu/arm-smmu: Implement of_xlate() for SMMUv3") Reported-by: Tomasz Nowicki Tested-by: Tomasz Nowicki Tested-by: Jayachandran C. Signed-off-by: Robin Murphy Signed-off-by: Will Deacon Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/arm-smmu-v3.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 1ff108794384..8f7a3c00b6cf 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -1646,7 +1646,7 @@ static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec) { - int i; + int i, j; struct arm_smmu_master_data *master = fwspec->iommu_priv; struct arm_smmu_device *smmu = master->smmu; @@ -1654,6 +1654,13 @@ static void arm_smmu_install_ste_for_dev(struct iommu_fwspec *fwspec) u32 sid = fwspec->ids[i]; __le64 *step = arm_smmu_get_step_for_sid(smmu, sid); + /* Bridged PCI devices may end up with duplicated IDs */ + for (j = 0; j < i; j++) + if (fwspec->ids[j] == sid) + break; + if (j < i) + continue; + arm_smmu_write_strtab_ent(smmu, sid, step, &master->ste); } } From d17237bbdf70eb2a30301eb94533fe712ba96cb0 Mon Sep 17 00:00:00 2001 From: Vineet Gupta Date: Fri, 8 Dec 2017 08:26:58 -0800 Subject: [PATCH 0996/1013] ARC: uaccess: dont use "l" gcc inline asm constraint modifier commit 79435ac78d160e4c245544d457850a56f805ac0d upstream. This used to setup the LP_COUNT register automatically, but now has been removed. There was an earlier fix 3c7c7a2fc8811 which fixed instance in delay.h but somehow missed this one as gcc change had not made its way into production toolchains and was not pedantic as it is now ! Signed-off-by: Vineet Gupta Signed-off-by: Greg Kroah-Hartman --- arch/arc/include/asm/uaccess.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/arc/include/asm/uaccess.h b/arch/arc/include/asm/uaccess.h index f35974ee7264..c9173c02081c 100644 --- a/arch/arc/include/asm/uaccess.h +++ b/arch/arc/include/asm/uaccess.h @@ -668,6 +668,7 @@ __arc_strncpy_from_user(char *dst, const char __user *src, long count) return 0; __asm__ __volatile__( + " mov lp_count, %5 \n" " lp 3f \n" "1: ldb.ab %3, [%2, 1] \n" " breq.d %3, 0, 3f \n" @@ -684,8 +685,8 @@ __arc_strncpy_from_user(char *dst, const char __user *src, long count) " .word 1b, 4b \n" " .previous \n" : "+r"(res), "+r"(dst), "+r"(src), "=r"(val) - : "g"(-EFAULT), "l"(count) - : "memory"); + : "g"(-EFAULT), "r"(count) + : "lp_count", "lp_start", "lp_end", "memory"); return res; } From 49b52457e1fc9cbf21ba7a3e24fcc4541e0c08d2 Mon Sep 17 00:00:00 2001 From: John Sperbeck Date: Sun, 31 Dec 2017 21:24:58 -0800 Subject: [PATCH 0997/1013] powerpc/mm: Fix SEGV on mapped region to return SEGV_ACCERR commit ecb101aed86156ec7cd71e5dca668e09146e6994 upstream. The recent refactoring of the powerpc page fault handler in commit c3350602e876 ("powerpc/mm: Make bad_area* helper functions") caused access to protected memory regions to indicate SEGV_MAPERR instead of the traditional SEGV_ACCERR in the si_code field of a user-space signal handler. This can confuse debug libraries that temporarily change the protection of memory regions, and expect to use SEGV_ACCERR as an indication to restore access to a region. This commit restores the previous behavior. The following program exhibits the issue: $ ./repro read || echo "FAILED" $ ./repro write || echo "FAILED" $ ./repro exec || echo "FAILED" #include #include #include #include #include #include #include static void segv_handler(int n, siginfo_t *info, void *arg) { _exit(info->si_code == SEGV_ACCERR ? 0 : 1); } int main(int argc, char **argv) { void *p = NULL; struct sigaction act = { .sa_sigaction = segv_handler, .sa_flags = SA_SIGINFO, }; assert(argc == 2); p = mmap(NULL, getpagesize(), (strcmp(argv[1], "write") == 0) ? PROT_READ : 0, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); assert(p != MAP_FAILED); assert(sigaction(SIGSEGV, &act, NULL) == 0); if (strcmp(argv[1], "read") == 0) printf("%c", *(unsigned char *)p); else if (strcmp(argv[1], "write") == 0) *(unsigned char *)p = 0; else if (strcmp(argv[1], "exec") == 0) ((void (*)(void))p)(); return 1; /* failed to generate SEGV */ } Fixes: c3350602e876 ("powerpc/mm: Make bad_area* helper functions") Signed-off-by: John Sperbeck Acked-by: Benjamin Herrenschmidt [mpe: Add commit references in change log] Signed-off-by: Michael Ellerman Signed-off-by: Greg Kroah-Hartman --- arch/powerpc/mm/fault.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c index 4797d08581ce..6e1e39035380 100644 --- a/arch/powerpc/mm/fault.c +++ b/arch/powerpc/mm/fault.c @@ -145,6 +145,11 @@ static noinline int bad_area(struct pt_regs *regs, unsigned long address) return __bad_area(regs, address, SEGV_MAPERR); } +static noinline int bad_access(struct pt_regs *regs, unsigned long address) +{ + return __bad_area(regs, address, SEGV_ACCERR); +} + static int do_sigbus(struct pt_regs *regs, unsigned long address, unsigned int fault) { @@ -490,7 +495,7 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address, good_area: if (unlikely(access_error(is_write, is_exec, vma))) - return bad_area(regs, address); + return bad_access(regs, address); /* * If for any reason at all we couldn't handle the fault, From 2df832f7aa68dd1cf454d6fa165f1c61a3cd04e5 Mon Sep 17 00:00:00 2001 From: Aaron Ma Date: Sat, 25 Nov 2017 16:48:41 -0800 Subject: [PATCH 0998/1013] Input: elantech - add new icbody type 15 commit 10d900303f1c3a821eb0bef4e7b7ece16768fba4 upstream. The touchpad of Lenovo Thinkpad L480 reports it's version as 15. Signed-off-by: Aaron Ma Signed-off-by: Dmitry Torokhov Signed-off-by: Greg Kroah-Hartman --- drivers/input/mouse/elantech.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/input/mouse/elantech.c b/drivers/input/mouse/elantech.c index b84cd978fce2..a4aaa748e987 100644 --- a/drivers/input/mouse/elantech.c +++ b/drivers/input/mouse/elantech.c @@ -1613,7 +1613,7 @@ static int elantech_set_properties(struct elantech_data *etd) case 5: etd->hw_version = 3; break; - case 6 ... 14: + case 6 ... 15: etd->hw_version = 4; break; default: From 46789641800ca2077acb66c6cbe8e2ce7575113c Mon Sep 17 00:00:00 2001 From: Tom Lendacky Date: Thu, 30 Nov 2017 16:46:40 -0600 Subject: [PATCH 0999/1013] x86/microcode/AMD: Add support for fam17h microcode loading commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf upstream. The size for the Microcode Patch Block (MPB) for an AMD family 17h processor is 3200 bytes. Add a #define for fam17h so that it does not default to 2048 bytes and fail a microcode load/update. Signed-off-by: Tom Lendacky Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.net Signed-off-by: Ingo Molnar Cc: Alice Ferrazzi Signed-off-by: Greg Kroah-Hartman --- arch/x86/kernel/cpu/microcode/amd.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/arch/x86/kernel/cpu/microcode/amd.c b/arch/x86/kernel/cpu/microcode/amd.c index c6daec4bdba5..330b8462d426 100644 --- a/arch/x86/kernel/cpu/microcode/amd.c +++ b/arch/x86/kernel/cpu/microcode/amd.c @@ -470,6 +470,7 @@ static unsigned int verify_patch_size(u8 family, u32 patch_size, #define F14H_MPB_MAX_SIZE 1824 #define F15H_MPB_MAX_SIZE 4096 #define F16H_MPB_MAX_SIZE 3458 +#define F17H_MPB_MAX_SIZE 3200 switch (family) { case 0x14: @@ -481,6 +482,9 @@ static unsigned int verify_patch_size(u8 family, u32 patch_size, case 0x16: max_size = F16H_MPB_MAX_SIZE; break; + case 0x17: + max_size = F17H_MPB_MAX_SIZE; + break; default: max_size = F1XH_MPB_MAX_SIZE; break; From f5edee88ad430356e5c6b1c309782cce3f736272 Mon Sep 17 00:00:00 2001 From: John Johansen Date: Thu, 7 Dec 2017 00:28:27 -0800 Subject: [PATCH 1000/1013] apparmor: fix regression in mount mediation when feature set is pinned MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 5b9f57cf47b87f07210875d6a24776b4496b818d upstream. When the mount code was refactored for Labels it was not correctly updated to check whether policy supported mediation of the mount class. This causes a regression when the kernel feature set is reported as supporting mount and policy is pinned to a feature set that does not support mount mediation. BugLink: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=882697#41 Fixes: 2ea3ffb7782a ("apparmor: add mount mediation") Reported-by: Fabian Grünbichler Signed-off-by: John Johansen Signed-off-by: Greg Kroah-Hartman --- security/apparmor/mount.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/security/apparmor/mount.c b/security/apparmor/mount.c index 82a64b58041d..e395137ecff1 100644 --- a/security/apparmor/mount.c +++ b/security/apparmor/mount.c @@ -330,6 +330,9 @@ static int match_mnt_path_str(struct aa_profile *profile, AA_BUG(!mntpath); AA_BUG(!buffer); + if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT)) + return 0; + error = aa_path_name(mntpath, path_flags(profile, mntpath), buffer, &mntpnt, &info, profile->disconnected); if (error) @@ -381,6 +384,9 @@ static int match_mnt(struct aa_profile *profile, const struct path *path, AA_BUG(!profile); AA_BUG(devpath && !devbuffer); + if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT)) + return 0; + if (devpath) { error = aa_path_name(devpath, path_flags(profile, devpath), devbuffer, &devname, &info, @@ -559,6 +565,9 @@ static int profile_umount(struct aa_profile *profile, struct path *path, AA_BUG(!profile); AA_BUG(!path); + if (!PROFILE_MEDIATES(profile, AA_CLASS_MOUNT)) + return 0; + error = aa_path_name(path, path_flags(profile, path), buffer, &name, &info, profile->disconnected); if (error) @@ -614,7 +623,8 @@ static struct aa_label *build_pivotroot(struct aa_profile *profile, AA_BUG(!new_path); AA_BUG(!old_path); - if (profile_unconfined(profile)) + if (profile_unconfined(profile) || + !PROFILE_MEDIATES(profile, AA_CLASS_MOUNT)) return aa_get_newest_label(&profile->label); error = aa_path_name(old_path, path_flags(profile, old_path), From fb510265edaf4809c050e620231b6ea69507fe82 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Tue, 2 Jan 2018 20:36:44 +0100 Subject: [PATCH 1001/1013] parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel commit 88776c0e70be0290f8357019d844aae15edaa967 upstream. Qemu for PARISC reported on a 32bit SMP parisc kernel strange failures about "Not-handled unaligned insn 0x0e8011d6 and 0x0c2011c9." Those opcodes evaluate to the ldcw() assembly instruction which requires (on 32bit) an alignment of 16 bytes to ensure atomicity. As it turns out, qemu is correct and in our assembly code in entry.S and pacache.S we don't pay attention to the required alignment. This patch fixes the problem by aligning the lock offset in assembly code in the same manner as we do in our C-code. Signed-off-by: Helge Deller Signed-off-by: Greg Kroah-Hartman --- arch/parisc/include/asm/ldcw.h | 2 ++ arch/parisc/kernel/entry.S | 13 +++++++++++-- arch/parisc/kernel/pacache.S | 9 +++++++-- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/arch/parisc/include/asm/ldcw.h b/arch/parisc/include/asm/ldcw.h index dd5a08aaa4da..3eb4bfc1fb36 100644 --- a/arch/parisc/include/asm/ldcw.h +++ b/arch/parisc/include/asm/ldcw.h @@ -12,6 +12,7 @@ for the semaphore. */ #define __PA_LDCW_ALIGNMENT 16 +#define __PA_LDCW_ALIGN_ORDER 4 #define __ldcw_align(a) ({ \ unsigned long __ret = (unsigned long) &(a)->lock[0]; \ __ret = (__ret + __PA_LDCW_ALIGNMENT - 1) \ @@ -29,6 +30,7 @@ ldcd). */ #define __PA_LDCW_ALIGNMENT 4 +#define __PA_LDCW_ALIGN_ORDER 2 #define __ldcw_align(a) (&(a)->slock) #define __LDCW "ldcw,co" diff --git a/arch/parisc/kernel/entry.S b/arch/parisc/kernel/entry.S index f3cecf5117cf..e95207c0565e 100644 --- a/arch/parisc/kernel/entry.S +++ b/arch/parisc/kernel/entry.S @@ -35,6 +35,7 @@ #include #include #include +#include #include #include @@ -46,6 +47,14 @@ #endif .import pa_tlb_lock,data + .macro load_pa_tlb_lock reg +#if __PA_LDCW_ALIGNMENT > 4 + load32 PA(pa_tlb_lock) + __PA_LDCW_ALIGNMENT-1, \reg + depi 0,31,__PA_LDCW_ALIGN_ORDER, \reg +#else + load32 PA(pa_tlb_lock), \reg +#endif + .endm /* space_to_prot macro creates a prot id from a space id */ @@ -457,7 +466,7 @@ .macro tlb_lock spc,ptp,pte,tmp,tmp1,fault #ifdef CONFIG_SMP cmpib,COND(=),n 0,\spc,2f - load32 PA(pa_tlb_lock),\tmp + load_pa_tlb_lock \tmp 1: LDCW 0(\tmp),\tmp1 cmpib,COND(=) 0,\tmp1,1b nop @@ -480,7 +489,7 @@ /* Release pa_tlb_lock lock. */ .macro tlb_unlock1 spc,tmp #ifdef CONFIG_SMP - load32 PA(pa_tlb_lock),\tmp + load_pa_tlb_lock \tmp tlb_unlock0 \spc,\tmp #endif .endm diff --git a/arch/parisc/kernel/pacache.S b/arch/parisc/kernel/pacache.S index adf7187f8951..2d40c4ff3f69 100644 --- a/arch/parisc/kernel/pacache.S +++ b/arch/parisc/kernel/pacache.S @@ -36,6 +36,7 @@ #include #include #include +#include #include .text @@ -333,8 +334,12 @@ ENDPROC_CFI(flush_data_cache_local) .macro tlb_lock la,flags,tmp #ifdef CONFIG_SMP - ldil L%pa_tlb_lock,%r1 - ldo R%pa_tlb_lock(%r1),\la +#if __PA_LDCW_ALIGNMENT > 4 + load32 pa_tlb_lock + __PA_LDCW_ALIGNMENT-1, \la + depi 0,31,__PA_LDCW_ALIGN_ORDER, \la +#else + load32 pa_tlb_lock, \la +#endif rsm PSW_SM_I,\flags 1: LDCW 0(\la),\tmp cmpib,<>,n 0,\tmp,3f From a0fdea2a407e0f2503f6b80b1a89c44fb58d64e9 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Fri, 5 Jan 2018 21:55:38 +0100 Subject: [PATCH 1002/1013] parisc: qemu idle sleep support commit 310d82784fb4d60c80569f5ca9f53a7f3bf1d477 upstream. Add qemu idle sleep support when running under qemu with SeaBIOS PDC firmware. Like the power architecture we use the "or" assembler instructions, which translate to nops on real hardware, to indicate that qemu shall idle sleep. Signed-off-by: Helge Deller Cc: Richard Henderson Signed-off-by: Greg Kroah-Hartman --- arch/parisc/kernel/process.c | 39 ++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c index 30f92391a93e..cad3e8661cd6 100644 --- a/arch/parisc/kernel/process.c +++ b/arch/parisc/kernel/process.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include #include @@ -183,6 +184,44 @@ int dump_task_fpu (struct task_struct *tsk, elf_fpregset_t *r) return 1; } +/* + * Idle thread support + * + * Detect when running on QEMU with SeaBIOS PDC Firmware and let + * QEMU idle the host too. + */ + +int running_on_qemu __read_mostly; + +void __cpuidle arch_cpu_idle_dead(void) +{ + /* nop on real hardware, qemu will offline CPU. */ + asm volatile("or %%r31,%%r31,%%r31\n":::); +} + +void __cpuidle arch_cpu_idle(void) +{ + local_irq_enable(); + + /* nop on real hardware, qemu will idle sleep. */ + asm volatile("or %%r10,%%r10,%%r10\n":::); +} + +static int __init parisc_idle_init(void) +{ + const char *marker; + + /* check QEMU/SeaBIOS marker in PAGE0 */ + marker = (char *) &PAGE0->pad0; + running_on_qemu = (memcmp(marker, "SeaBIOS", 8) == 0); + + if (!running_on_qemu) + cpu_idle_poll_ctrl(1); + + return 0; +} +arch_initcall(parisc_idle_init); + /* * Copy architecture-specific thread state */ From 756ac0b046182487a2b813dd1dadb17e0d7b8b67 Mon Sep 17 00:00:00 2001 From: Boris Brezillon Date: Mon, 18 Dec 2017 11:32:45 +0100 Subject: [PATCH 1003/1013] mtd: nand: pxa3xx: Fix READOOB implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit fee4380f368e84ed216b62ccd2fbc4126f2bf40b upstream. In the current driver, OOB bytes are accessed in raw mode, and when a page access is done with NDCR_SPARE_EN set and NDCR_ECC_EN cleared, the driver must read the whole spare area (64 bytes in case of a 2k page, 16 bytes for a 512 page). The driver was only reading the free OOB bytes, which was leaving some unread data in the FIFO and was somehow leading to a timeout. We could patch the driver to read ->spare_size + ->ecc_size instead of just ->spare_size when READOOB is requested, but we'd better make in-band and OOB accesses consistent. Since the driver is always accessing in-band data in non-raw mode (with the ECC engine enabled), we should also access OOB data in this mode. That's particularly useful when using the BCH engine because in this mode the free OOB bytes are also ECC protected. Fixes: 43bcfd2bb24a ("mtd: nand: pxa3xx: Add driver-specific ECC BCH support") Reported-by: Sean Nyekjær Tested-by: Willy Tarreau Signed-off-by: Boris Brezillon Acked-by: Ezequiel Garcia Tested-by: Sean Nyekjaer Acked-by: Robert Jarzmik Signed-off-by: Richard Weinberger Signed-off-by: Greg Kroah-Hartman --- drivers/mtd/nand/pxa3xx_nand.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c index 85cff68643e0..125b744c9c28 100644 --- a/drivers/mtd/nand/pxa3xx_nand.c +++ b/drivers/mtd/nand/pxa3xx_nand.c @@ -950,6 +950,7 @@ static void prepare_start_command(struct pxa3xx_nand_info *info, int command) switch (command) { case NAND_CMD_READ0: + case NAND_CMD_READOOB: case NAND_CMD_PAGEPROG: info->use_ecc = 1; break; From 853bcadeefe18ee0661a302811a17644b1f9e80b Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Fri, 15 Dec 2017 13:14:31 +0100 Subject: [PATCH 1004/1013] KVM: s390: fix cmma migration for multiple memory slots commit 32aa144fc32abfcbf7140f473dfbd94c5b9b4105 upstream. When multiple memory slots are present the cmma migration code does not allocate enough memory for the bitmap. The memory slots are sorted in reverse order, so we must use gfn and size of slot[0] instead of the last one. Signed-off-by: Christian Borntraeger Reviewed-by: Claudio Imbrenda Fixes: 190df4a212a7 (KVM: s390: CMMA tracking, ESSA emulation, migration mode) Reviewed-by: Cornelia Huck Signed-off-by: Greg Kroah-Hartman --- arch/s390/kvm/kvm-s390.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 40d0a1a97889..b87a930c2201 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -794,11 +794,12 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm) if (kvm->arch.use_cmma) { /* - * Get the last slot. They should be sorted by base_gfn, so the - * last slot is also the one at the end of the address space. - * We have verified above that at least one slot is present. + * Get the first slot. They are reverse sorted by base_gfn, so + * the first slot is also the one at the end of the address + * space. We have verified above that at least one slot is + * present. */ - ms = slots->memslots + slots->used_slots - 1; + ms = slots->memslots; /* round up so we only use full longs */ ram_pages = roundup(ms->base_gfn + ms->npages, BITS_PER_LONG); /* allocate enough bytes to store all the bits */ From 77bbeea7811f0857952d3f096438ce2e1bb16f1a Mon Sep 17 00:00:00 2001 From: Christian Borntraeger Date: Thu, 21 Dec 2017 09:18:22 +0100 Subject: [PATCH 1005/1013] KVM: s390: prevent buffer overrun on memory hotplug during migration commit c2cf265d860882b51a200e4a7553c17827f2b730 upstream. We must not go beyond the pre-allocated buffer. This can happen when a new memory slot is added during migration. Reported-by: David Hildenbrand Signed-off-by: Christian Borntraeger Fixes: 190df4a212a7 (KVM: s390: CMMA tracking, ESSA emulation, migration mode) Reviewed-by: Cornelia Huck Reviewed-by: David Hildenbrand Signed-off-by: Greg Kroah-Hartman --- arch/s390/kvm/priv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c index 5b25287f449b..7bd3a59232f0 100644 --- a/arch/s390/kvm/priv.c +++ b/arch/s390/kvm/priv.c @@ -1009,7 +1009,7 @@ static inline int do_essa(struct kvm_vcpu *vcpu, const int orc) cbrlo[entries] = gfn << PAGE_SHIFT; } - if (orc) { + if (orc && gfn < ms->bitmap_size) { /* increment only if we are really flipping the bit to 1 */ if (!test_and_set_bit(gfn, ms->pgste_bitmap)) atomic64_inc(&ms->dirty_pages); From b8447222eb207d5a5ec20a0f357065963dabdcd0 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 10 Jan 2018 09:31:23 +0100 Subject: [PATCH 1006/1013] Linux 4.14.13 --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 20f7d4de0f1c..a67c5179052a 100644 --- a/Makefile +++ b/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 VERSION = 4 PATCHLEVEL = 14 -SUBLEVEL = 12 +SUBLEVEL = 13 EXTRAVERSION = NAME = Petit Gorille From da4b823a507406e714380d88f2b405204155eaca Mon Sep 17 00:00:00 2001 From: vivek Date: Tue, 3 Nov 2015 20:43:02 +0000 Subject: [PATCH 1007/1013] remove stray linux init header from vdso --- arch/x86/um/vdso/vdso.S | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/x86/um/vdso/vdso.S b/arch/x86/um/vdso/vdso.S index a4a3870dc059..4b4bd4cc06ab 100644 --- a/arch/x86/um/vdso/vdso.S +++ b/arch/x86/um/vdso/vdso.S @@ -1,5 +1,3 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#include __INITDATA From 42e900ae641c79a036a64a31efa46f1cfb3774c2 Mon Sep 17 00:00:00 2001 From: vivek Date: Tue, 3 Nov 2015 20:43:02 +0000 Subject: [PATCH 1008/1013] i915 only disable default vga device --- drivers/gpu/drm/i915/intel_display.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 1c73d5542681..53246cdea9c9 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -14389,12 +14389,14 @@ static void i915_disable_vga(struct drm_i915_private *dev_priv) i915_reg_t vga_reg = i915_vgacntrl_reg(dev_priv); /* WaEnableVGAAccessThroughIOPort:ctg,elk,ilk,snb,ivb,vlv,hsw */ - vga_get_uninterruptible(pdev, VGA_RSRC_LEGACY_IO); - outb(SR01, VGA_SR_INDEX); - sr1 = inb(VGA_SR_DATA); - outb(sr1 | 1<<5, VGA_SR_DATA); - vga_put(pdev, VGA_RSRC_LEGACY_IO); - udelay(300); + if (pdev == vga_default_device()) { + vga_get_uninterruptible(pdev, VGA_RSRC_LEGACY_IO); + outb(SR01, VGA_SR_INDEX); + sr1 = inb(VGA_SR_DATA); + outb(sr1 | 1<<5, VGA_SR_DATA); + vga_put(pdev, VGA_RSRC_LEGACY_IO); + udelay(300); + } I915_WRITE(vga_reg, VGA_DISP_DISABLE); POSTING_READ(vga_reg); From 6635ead206ab03ae80aca4155146095b40b016d5 Mon Sep 17 00:00:00 2001 From: John Vert Date: Fri, 20 Jan 2017 14:47:55 -0800 Subject: [PATCH 1009/1013] broadwell enable fast sample c ops --- drivers/gpu/drm/i915/intel_pm.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index cb950752c346..5813634c2b6a 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -8336,6 +8336,10 @@ static void broadwell_init_clock_gating(struct drm_i915_private *dev_priv) I915_WRITE(CHICKEN_PAR2_1, I915_READ(CHICKEN_PAR2_1) | KVM_CONFIG_CHANGE_NOTIFICATION_SELECT); + /* Faster SAMPLE_C operations. */ + I915_WRITE(HALF_SLICE_CHICKEN3, + _MASKED_BIT_ENABLE(HSW_SAMPLE_C_PERFORMANCE)); + lpt_init_clock_gating(dev_priv); /* WaDisableDopClockGating:bdw From a04871d44b7b257bdd7f8b418fa8d8c95b71ea24 Mon Sep 17 00:00:00 2001 From: "Pierre-Loup A. Griffais" Date: Wed, 16 Nov 2016 17:17:09 -0800 Subject: [PATCH 1010/1013] Add support for the Xbox One S controller over Bluetooth --- drivers/hid/hid-core.c | 1 + drivers/hid/hid-ids.h | 2 ++ drivers/hid/hid-microsoft.c | 38 +++++++++++++++++++++++++++++++++++++ net/bluetooth/l2cap_core.c | 9 +++++++++ 4 files changed, 50 insertions(+) diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index 330ca983828b..2687c562aadc 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -2175,6 +2175,7 @@ static const struct hid_device_id hid_have_special_driver[] = { { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PRESENTER_8K_USB) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_DIGITAL_MEDIA_3K) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_WIRELESS_OPTICAL_DESKTOP_3_0) }, + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_XBOX_ONE_S) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_OFFICE_KB) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_DIGITAL_MEDIA_7K) }, { HID_USB_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_DIGITAL_MEDIA_600) }, diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index be2e005c3c51..302658dcdf7e 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -762,6 +762,8 @@ #define USB_DEVICE_ID_MS_TOUCH_COVER_2 0x07a7 #define USB_DEVICE_ID_MS_TYPE_COVER_2 0x07a9 #define USB_DEVICE_ID_MS_POWER_COVER 0x07da +#define USB_DEVICE_ID_MS_XBOX_ONE_S 0x02e0 + #define USB_VENDOR_ID_MOJO 0x8282 #define USB_DEVICE_ID_RETRO_ADAPTER 0x3201 diff --git a/drivers/hid/hid-microsoft.c b/drivers/hid/hid-microsoft.c index 96e7d3231d2f..6495742cd245 100644 --- a/drivers/hid/hid-microsoft.c +++ b/drivers/hid/hid-microsoft.c @@ -28,6 +28,8 @@ #define MS_RDESC 0x08 #define MS_NOGET 0x10 #define MS_DUPLICATE_USAGES 0x20 +#define MS_RDESC_3K 0x40 +#define MS_XBOX_ONE_S 0x80 static __u8 *ms_report_fixup(struct hid_device *hdev, __u8 *rdesc, unsigned int *rsize) @@ -130,11 +132,43 @@ static int ms_presenter_8k_quirk(struct hid_input *hi, struct hid_usage *usage, return 1; } +static int ms_xbox_one_s_quirk(struct hid_input *hi, struct hid_usage *usage, + unsigned long **bit, int *max) +{ + switch (usage->hid & HID_USAGE_PAGE) { + + case HID_UP_GENDESK: + if ((usage->hid & 0xf0) == 0x80) { /* SystemControl */ + switch (usage->hid & 0xf) { + case 0x5: + /* override the default mapping of KEY_MENU */ + ms_map_key_clear(BTN_MODE); + return 1; + default: + break; + } + break; + } + break; + + default: + break; + } + + /* Let the normal mapping happen for everything else */ + return 0; +} + static int ms_input_mapping(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) { unsigned long quirks = (unsigned long)hid_get_drvdata(hdev); + + if (quirks & MS_XBOX_ONE_S) { + int ret = ms_xbox_one_s_quirk(hi, usage, bit, max); + return ret; + } if (quirks & MS_ERGONOMY) { int ret = ms_ergonomy_kb_quirk(hi, usage, bit, max); @@ -281,6 +315,10 @@ static const struct hid_device_id ms_devices[] = { { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_PRESENTER_8K_BT), .driver_data = MS_PRESENTER }, + + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_MICROSOFT, USB_DEVICE_ID_MS_XBOX_ONE_S), + .driver_data = MS_XBOX_ONE_S }, + { } }; MODULE_DEVICE_TABLE(hid, ms_devices); diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 43ba91c440bc..53978c5b09e7 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -3228,6 +3228,14 @@ static int l2cap_build_conf_req(struct l2cap_chan *chan, void *data, size_t data switch (chan->mode) { case L2CAP_MODE_BASIC: + /* HACK FOR XBOX ONE S CONTROLLERS + * If we add options to the configuration request, the + * Xbox One S controller responds with: + * Result: Failure - unknown options (0x0003) + * For now, just don't send configuration options. + * Remove this hack once the controller supports it + */ +#if 0 if (disable_ertm) break; @@ -3244,6 +3252,7 @@ static int l2cap_build_conf_req(struct l2cap_chan *chan, void *data, size_t data l2cap_add_conf_opt(&ptr, L2CAP_CONF_RFC, sizeof(rfc), (unsigned long) &rfc, endptr - ptr); +#endif break; case L2CAP_MODE_ERTM: From 747397a3fea3f641ea72531acf8078c409d52c0d Mon Sep 17 00:00:00 2001 From: vivek Date: Tue, 3 Nov 2015 20:43:03 +0000 Subject: [PATCH 1011/1013] sixaxis dualshock report via ctrl quirk --- net/bluetooth/hidp/core.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c index 8112893037bd..4dd706c5941d 100644 --- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -379,9 +379,16 @@ static int hidp_output_report(struct hid_device *hid, __u8 *data, size_t count) { struct hidp_session *session = hid->driver_data; - return hidp_send_intr_message(session, - HIDP_TRANS_DATA | HIDP_DATA_RTYPE_OUPUT, - data, count); + /* The Sixaxis and Dualshock 4 wants report sent via the ctrl channel */ + if(hid->vendor == 0x54c && (hid->product == 0x5c4 || hid->product == 0x268)) { + return hidp_send_ctrl_message(session, + HIDP_TRANS_SET_REPORT | HIDP_DATA_RTYPE_OUPUT, + data, count); + } else { + return hidp_send_intr_message(session, + HIDP_TRANS_DATA | HIDP_DATA_RTYPE_OUPUT, + data, count); + } } static int hidp_raw_request(struct hid_device *hid, unsigned char reportnum, From 9f282ed1ceabf3b2c0ef8ff859815d9cdcb3c223 Mon Sep 17 00:00:00 2001 From: John Vert Date: Tue, 31 Jan 2017 17:38:49 -0800 Subject: [PATCH 1012/1013] add REPORTING-BUGS file to fix build --- REPORTING-BUGS | 174 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 REPORTING-BUGS diff --git a/REPORTING-BUGS b/REPORTING-BUGS new file mode 100644 index 000000000000..0cb8cdfa63bc --- /dev/null +++ b/REPORTING-BUGS @@ -0,0 +1,174 @@ +Background +========== + +The upstream Linux kernel maintainers only fix bugs for specific kernel +versions. Those versions include the current "release candidate" (or -rc) +kernel, any "stable" kernel versions, and any "long term" kernels. + +Please see https://www.kernel.org/ for a list of supported kernels. Any +kernel marked with [EOL] is "end of life" and will not have any fixes +backported to it. + +If you've found a bug on a kernel version isn't listed on kernel.org, +contact your Linux distribution or embedded vendor for support. +Alternatively, you can attempt to run one of the supported stable or -rc +kernels, and see if you can reproduce the bug on that. It's preferable +to reproduce the bug on the latest -rc kernel. + + +How to report Linux kernel bugs +=============================== + + +Identify the problematic subsystem +---------------------------------- + +Identifying which part of the Linux kernel might be causing your issue +increases your chances of getting your bug fixed. Simply posting to the +generic linux-kernel mailing list (LKML) may cause your bug report to be +lost in the noise of a mailing list that gets 1000+ emails a day. + +Instead, try to figure out which kernel subsystem is causing the issue, +and email that subsystem's maintainer and mailing list. If the subsystem +maintainer doesn't answer, then expand your scope to mailing lists like +LKML. + + +Identify who to notify +---------------------- + +Once you know the subsystem that is causing the issue, you should send a +bug report. Some maintainers prefer bugs to be reported via bugzilla +(https://bugzilla.kernel.org), while others prefer that bugs be reported +via the subsystem mailing list. + +To find out where to send an emailed bug report, find your subsystem or +device driver in the MAINTAINERS file. Search in the file for relevant +entries, and send your bug report to the person(s) listed in the "M:" +lines, making sure to Cc the mailing list(s) in the "L:" lines. When the +maintainer replies to you, make sure to 'Reply-all' in order to keep the +public mailing list(s) in the email thread. + +If you know which driver is causing issues, you can pass one of the driver +files to the get_maintainer.pl script: + perl scripts/get_maintainer.pl -f + +If it is a security bug, please copy the Security Contact listed in the +MAINTAINERS file. They can help coordinate bugfix and disclosure. See +Documentation/SecurityBugs for more information. + +If you can't figure out which subsystem caused the issue, you should file +a bug in kernel.org bugzilla and send email to +linux-kernel@vger.kernel.org, referencing the bugzilla URL. (For more +information on the linux-kernel mailing list see +http://www.tux.org/lkml/). + + +Tips for reporting bugs +----------------------- + +If you haven't reported a bug before, please read: + +http://www.chiark.greenend.org.uk/~sgtatham/bugs.html +http://www.catb.org/esr/faqs/smart-questions.html + +It's REALLY important to report bugs that seem unrelated as separate email +threads or separate bugzilla entries. If you report several unrelated +bugs at once, it's difficult for maintainers to tease apart the relevant +data. + + +Gather information +------------------ + +The most important information in a bug report is how to reproduce the +bug. This includes system information, and (most importantly) +step-by-step instructions for how a user can trigger the bug. + +If the failure includes an "OOPS:", take a picture of the screen, capture +a netconsole trace, or type the message from your screen into the bug +report. Please read "Documentation/oops-tracing.txt" before posting your +bug report. This explains what you should do with the "Oops" information +to make it useful to the recipient. + +This is a suggested format for a bug report sent via email or bugzilla. +Having a standardized bug report form makes it easier for you not to +overlook things, and easier for the developers to find the pieces of +information they're really interested in. If some information is not +relevant to your bug, feel free to exclude it. + +First run the ver_linux script included as scripts/ver_linux, which +reports the version of some important subsystems. Run this script with +the command "sh scripts/ver_linux". + +Use that information to fill in all fields of the bug report form, and +post it to the mailing list with a subject of "PROBLEM: " for easy identification by the developers. + +[1.] One line summary of the problem: +[2.] Full description of the problem/report: +[3.] Keywords (i.e., modules, networking, kernel): +[4.] Kernel information +[4.1.] Kernel version (from /proc/version): +[4.2.] Kernel .config file: +[5.] Most recent kernel version which did not have the bug: +[6.] Output of Oops.. message (if applicable) with symbolic information + resolved (see Documentation/oops-tracing.txt) +[7.] A small shell script or example program which triggers the + problem (if possible) +[8.] Environment +[8.1.] Software (add the output of the ver_linux script here) +[8.2.] Processor information (from /proc/cpuinfo): +[8.3.] Module information (from /proc/modules): +[8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) +[8.5.] PCI information ('lspci -vvv' as root) +[8.6.] SCSI information (from /proc/scsi/scsi) +[8.7.] Other information that might be relevant to the problem + (please look in /proc and include all information that you + think to be relevant): +[X.] Other notes, patches, fixes, workarounds: + + +Follow up +========= + +Expectations for bug reporters +------------------------------ + +Linux kernel maintainers expect bug reporters to be able to follow up on +bug reports. That may include running new tests, applying patches, +recompiling your kernel, and/or re-triggering your bug. The most +frustrating thing for maintainers is for someone to report a bug, and then +never follow up on a request to try out a fix. + +That said, it's still useful for a kernel maintainer to know a bug exists +on a supported kernel, even if you can't follow up with retests. Follow +up reports, such as replying to the email thread with "I tried the latest +kernel and I can't reproduce my bug anymore" are also helpful, because +maintainers have to assume silence means things are still broken. + +Expectations for kernel maintainers +----------------------------------- + +Linux kernel maintainers are busy, overworked human beings. Some times +they may not be able to address your bug in a day, a week, or two weeks. +If they don't answer your email, they may be on vacation, or at a Linux +conference. Check the conference schedule at LWN.net for more info: + https://lwn.net/Calendar/ + +In general, kernel maintainers take 1 to 5 business days to respond to +bugs. The majority of kernel maintainers are employed to work on the +kernel, and they may not work on the weekends. Maintainers are scattered +around the world, and they may not work in your time zone. Unless you +have a high priority bug, please wait at least a week after the first bug +report before sending the maintainer a reminder email. + +The exceptions to this rule are regressions, kernel crashes, security holes, +or userspace breakage caused by new kernel behavior. Those bugs should be +addressed by the maintainers ASAP. If you suspect a maintainer is not +responding to these types of bugs in a timely manner (especially during a +merge window), escalate the bug to LKML and Linus Torvalds. + +Thank you! + +[Some of this is taken from Frohwalt Egerer's original linux-kernel FAQ] From be50eba020c7308c4358b8c4e05fdf5f7a02cccb Mon Sep 17 00:00:00 2001 From: John Vert Date: Fri, 5 May 2017 12:04:21 -0700 Subject: [PATCH 1013/1013] add -steamos EXTRAVERSION --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Makefile b/Makefile index a67c5179052a..ddb1e0b29b1f 100644 --- a/Makefile +++ b/Makefile @@ -2,7 +2,7 @@ VERSION = 4 PATCHLEVEL = 14 SUBLEVEL = 13 -EXTRAVERSION = +EXTRAVERSION = -steamos NAME = Petit Gorille # *DOCUMENTATION*