aboutsummaryrefslogtreecommitdiffstats
path: root/tools/perf/scripts/python/export-to-postgresql.py (unfollow)
AgeCommit message (Collapse)AuthorFilesLines
2020-06-15sched,bL_switcher: Convert to sched_set_fifo*()Peter Zijlstra1-2/+1
Because SCHED_FIFO is a broken scheduler model (see previous patches) take away the priority field, the kernel can't possibly make an informed decision. In this case, use fifo_low, because it only cares about being above SCHED_NORMAL. Effectively no change in behaviour. Cc: rmk+kernel@arm.linux.org.uk Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Nicolas Pitre <nico@fluxnic.net>
2020-06-15sched: Provide sched_set_fifo()Peter Zijlstra2-0/+50
SCHED_FIFO (or any static priority scheduler) is a broken scheduler model; it is fundamentally incapable of resource management, the one thing an OS is actually supposed to do. It is impossible to compose static priority workloads. One cannot take two well designed and functional static priority workloads and mash them together and still expect them to work. Therefore it doesn't make sense to expose the priority field; the kernel is fundamentally incapable of setting a sensible value, it needs systems knowledge that it doesn't have. Take away sched_setschedule() / sched_setattr() from modules and replace them with: - sched_set_fifo(p); create a FIFO task (at prio 50) - sched_set_fifo_low(p); create a task higher than NORMAL, which ends up being a FIFO task at prio 1. - sched_set_normal(p, nice); (re)set the task to normal This stops the proliferation of randomly chosen, and irrelevant, FIFO priorities that dont't really mean anything anyway. The system administrator/integrator, whoever has insight into the actual system design and requirements (userspace) can set-up appropriate priorities if and when needed. Cc: airlied@redhat.com Cc: alexander.deucher@amd.com Cc: awalls@md.metrocast.net Cc: axboe@kernel.dk Cc: broonie@kernel.org Cc: daniel.lezcano@linaro.org Cc: gregkh@linuxfoundation.org Cc: hannes@cmpxchg.org Cc: herbert@gondor.apana.org.au Cc: hverkuil@xs4all.nl Cc: john.stultz@linaro.org Cc: nico@fluxnic.net Cc: paulmck@kernel.org Cc: rafael.j.wysocki@intel.com Cc: rmk+kernel@arm.linux.org.uk Cc: sudeep.holla@arm.com Cc: tglx@linutronix.de Cc: ulf.hansson@linaro.org Cc: wim@linux-watchdog.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Ingo Molnar <mingo@kernel.org> Tested-by: Paul E. McKenney <paulmck@kernel.org>
2020-06-15sched/pelt: Cleanup PELT dividerVincent Guittot3-15/+24
Factorize in a single place the calculation of the divider to be used to to compute *_avg from *_sum value Suggested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200612154703.23555-1-vincent.guittot@linaro.org
2020-06-15sched/deadline: Fix a typo in a commentChristophe JAILLET1-1/+1
s/deadine/deadline/ Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200602195002.677448-1-christophe.jaillet@wanadoo.fr
2020-06-15sched/deadline: Implement fallback mechanism for !fit caseLuca Abeni1-4/+16
When a task has a runtime that cannot be served within the scheduling deadline by any of the idle CPU (later_mask) the task is doomed to miss its deadline. This can happen since the SCHED_DEADLINE admission control guarantees only bounded tardiness and not the hard respect of all deadlines. In this case try to select the idle CPU with the largest CPU capacity to minimize tardiness. Favor task_cpu(p) if it has max capacity of !fitting CPUs so that find_later_rq() can potentially still return it (most likely cache-hot) early. Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200520134243.19352-6-dietmar.eggemann@arm.com
2020-06-15sched/deadline: Make DL capacity-awareLuca Abeni3-5/+42
The current SCHED_DEADLINE (DL) scheduler uses a global EDF scheduling algorithm w/o considering CPU capacity or task utilization. This works well on homogeneous systems where DL tasks are guaranteed to have a bounded tardiness but presents issues on heterogeneous systems. A DL task can migrate to a CPU which does not have enough CPU capacity to correctly serve the task (e.g. a task w/ 70ms runtime and 100ms period on a CPU w/ 512 capacity). Add the DL fitness function dl_task_fits_capacity() for DL admission control on heterogeneous systems. A task fits onto a CPU if: CPU original capacity / 1024 >= task runtime / task deadline Use this function on heterogeneous systems to try to find a CPU which meets this criterion during task wakeup, push and offline migration. On homogeneous systems the original behavior of the DL admission control should be retained. Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200520134243.19352-5-dietmar.eggemann@arm.com
2020-06-15sched/deadline: Improve admission control for asymmetric CPU capacitiesLuca Abeni2-16/+20
The current SCHED_DEADLINE (DL) admission control ensures that sum of reserved CPU bandwidth < x * M where x = /proc/sys/kernel/sched_rt_{runtime,period}_us M = # CPUs in root domain. DL admission control works well for homogeneous systems where the capacity of all CPUs are equal (1024). I.e. bounded tardiness for DL and non-starvation of non-DL tasks is guaranteed. But on heterogeneous systems where capacity of CPUs are different it could fail by over-allocating CPU time on smaller capacity CPUs. On an Arm big.LITTLE/DynamIQ system DL tasks can easily starve other tasks making it unusable. Fix this by explicitly considering the CPU capacity in the DL admission test by replacing M with the root domain CPU capacity sum. Signed-off-by: Luca Abeni <luca.abeni@santannapisa.it> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200520134243.19352-4-dietmar.eggemann@arm.com
2020-06-15sched/deadline: Add dl_bw_capacity()Dietmar Eggemann1-0/+33
Capacity-aware SCHED_DEADLINE Admission Control (AC) needs root domain (rd) CPU capacity sum. Introduce dl_bw_capacity() which for a symmetric rd w/ a CPU capacity of SCHED_CAPACITY_SCALE simply relies on dl_bw_cpus() to return #CPUs multiplied by SCHED_CAPACITY_SCALE. For an asymmetric rd or a CPU capacity < SCHED_CAPACITY_SCALE it computes the CPU capacity sum over rd span and cpu_active_mask. A 'XXX Fix:' comment was added to highlight that if 'rq->rd == def_root_domain' AC should be performed against the capacity of the CPU the task is running on rather the rd CPU capacity sum. This issue already exists w/o capacity awareness. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200520134243.19352-3-dietmar.eggemann@arm.com
2020-06-15sched/deadline: Optimize dl_bw_cpus()Dietmar Eggemann1-1/+7
Return the weight of the root domain (rd) span in case it is a subset of the cpu_active_mask. Continue to compute the number of CPUs over rd span and cpu_active_mask when in hotplug. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lkml.kernel.org/r/20200520134243.19352-2-dietmar.eggemann@arm.com
2020-06-15sched: correct SD_flags returned by tl->sd_flags()Peng Liu1-1/+1
During sched domain init, we check whether non-topological SD_flags are returned by tl->sd_flags(), if found, fire a waning and correct the violation, but the code failed to correct the violation. Correct this. Fixes: 143e1e28cb40 ("sched: Rework sched_domain topology definition") Signed-off-by: Peng Liu <iwtbavbm@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20200609150936.GA13060@iZj6chx1xj0e0buvshuecpZ
2020-06-15sched/fair: Fix NOHZ next idle balanceVincent Guittot1-9/+14
With commit: 'b7031a02ec75 ("sched/fair: Add NOHZ_STATS_KICK")' rebalance_domains of the local cfs_rq happens before others idle cpus have updated nohz.next_balance and its value is overwritten. Move the update of nohz.next_balance for other idles cpus before balancing and updating the next_balance of local cfs_rq. Also, the nohz.next_balance is now updated only if all idle cpus got a chance to rebalance their domains and the idle balance has not been aborted because of new activities on the CPU. In case of need_resched, the idle load balance will be kick the next jiffie in order to address remaining ilb. Fixes: b7031a02ec75 ("sched/fair: Add NOHZ_STATS_KICK") Reported-by: Peng Liu <iwtbavbm@gmail.com> Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Acked-by: Mel Gorman <mgorman@suse.de> Link: https://lkml.kernel.org/r/20200609123748.18636-1-vincent.guittot@linaro.org
2020-06-15sched/deadline: Impose global limits on sched_attr::sched_periodPeter Zijlstra3-2/+38
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20190726161357.397880775@infradead.org
2020-06-15isolcpus: Affine unbound kernel threads to housekeeping cpusMarcelo Tosatti3-3/+7
This is a kernel enhancement that configures the cpu affinity of kernel threads via kernel boot option nohz_full=. When this option is specified, the cpumask is immediately applied upon kthread launch. This does not affect kernel threads that specify cpu and node. This allows CPU isolation (that is not allowing certain threads to execute on certain CPUs) without using the isolcpus=domain parameter, making it possible to enable load balancing on such CPUs during runtime (see kernel-parameters.txt). Note-1: this is based off on Wind River's patch at https://github.com/starlingx-staging/stx-integ/blob/master/kernel/kernel-std/centos/patches/affine-compute-kernel-threads.patch Difference being that this patch is limited to modifying kernel thread cpumask. Behaviour of other threads can be controlled via cgroups or sched_setaffinity. Note-2: Wind River's patch was based off Christoph Lameter's patch at https://lwn.net/Articles/565932/ with the only difference being the kernel parameter changed from kthread to kthread_cpus. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200527142909.23372-3-frederic@kernel.org
2020-06-15kthread: Switch to cpu_possible_maskMarcelo Tosatti1-2/+2
Next patch will switch unbound kernel threads mask to housekeeping_cpumask(), a subset of cpu_possible_mask. So in order to ease bisection, lets first switch kthreads default affinity from cpu_all_mask to cpu_possible_mask. It looks safe to do so as cpu_possible_mask seem to be initialized at setup_arch() time, way before kthreadd is created. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200527142909.23372-2-frederic@kernel.org
2020-06-15psi: eliminate kthread_worker from psi trigger scheduling mechanismSuren Baghdasaryan2-52/+68
Each psi group requires a dedicated kthread_delayed_work and kthread_worker. Since no other work can be performed using psi_group's kthread_worker, the same result can be obtained using a task_struct and a timer directly. This makes psi triggering simpler by removing lists and locks involved with kthread_worker usage and eliminates the need for poll_scheduled atomic use in the hot path. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200528195442.190116-1-surenb@google.com
2020-06-15x86, sched: Bail out of frequency invariance if turbo_freq/base_freq gives 0Giovanni Gherdovich1-2/+9
Be defensive against the case where the processor reports a base_freq larger than turbo_freq (the ratio would be zero). Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200531182453.15254-4-ggherdovich@suse.cz
2020-06-15x86, sched: Bail out of frequency invariance if turbo frequency is unknownGiovanni Gherdovich1-2/+4
There may be CPUs that support turbo boost but don't declare any turbo ratio, i.e. their MSR_TURBO_RATIO_LIMIT is all zeroes. In that condition scale-invariant calculations can't be performed. Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Suggested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Link: https://lkml.kernel.org/r/20200531182453.15254-3-ggherdovich@suse.cz
2020-06-15x86, sched: check for counters overflow in frequency invariant accountingGiovanni Gherdovich2-6/+29
The product mcnt * arch_max_freq_ratio can overflows u64. For context, a large value for arch_max_freq_ratio would be 5000, corresponding to a turbo_freq/base_freq ratio of 5 (normally it's more like 1500-2000). A large increment frequency for the MPERF counter would be 5GHz (the base clock of all CPUs on the market today is less than that). With these figures, a CPU would need to go without a scheduler tick for around 8 days for the u64 overflow to happen. It is unlikely, but the check is warranted. Under similar conditions, the difference acnt of two consecutive APERF readings can overflow as well. In these circumstances is appropriate to disable frequency invariant accounting: the feature relies on measures of the clock frequency done at every scheduler tick, which need to be "fresh" to be at all meaningful. A note on i386: prior to version 5.1, the GCC compiler didn't have the builtin function __builtin_mul_overflow. In these GCC versions the macro check_mul_overflow needs __udivdi3() to do (u64)a/b, which the kernel doesn't provide. For this reason this change fails to build on i386 if GCC<5.1, and we protect the entire frequency invariant code behind CONFIG_X86_64 (special thanks to "kbuild test robot" <lkp@intel.com>). Fixes: 1567c3e3467c ("x86, sched: Add support for frequency invariance") Signed-off-by: Giovanni Gherdovich <ggherdovich@suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200531182453.15254-2-ggherdovich@suse.cz
2020-06-15sched/debug: Add new tracepoints to track util_estVincent Donnefort3-0/+16
The util_est signals are key elements for EAS task placement and frequency selection. Having tracepoints to track these signals enables load-tracking and schedutil testing and/or debugging by a toolkit. Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/1590597554-370150-1-git-send-email-vincent.donnefort@arm.com
2020-06-15sched/fair: Remove unused 'sd' parameter from scale_rt_capacity()Dietmar Eggemann1-2/+2
Since commit 8ec59c0f5f49 ("sched/topology: Remove unused 'sd' parameter from arch_scale_cpu_capacity()") it is no longer needed. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20200603080304.16548-5-dietmar.eggemann@arm.com
2020-06-15sched/idle,stop: Remove .get_rr_interval from sched_classDietmar Eggemann2-15/+0
The idle task and stop task sched_classes return 0 in this function. The single call site in sched_rr_get_interval() calls p->sched_class->get_rr_interval() only conditional in case it is defined. Otherwise time_slice=0 will be used. The deadline sched class does not define it. Commit a57beec5d427 ("sched: Make sched_class::get_rr_interval() optional") introduced the default time-slice=0 for sched classes which do not provide this function. So .get_rr_interval for idle and stop sched_class can be removed to shrink the code a little. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200603080304.16548-4-dietmar.eggemann@arm.com
2020-06-15sched/core: Remove redundant 'preempt' param from sched_class->yield_to_task()Dietmar Eggemann3-3/+3
Commit 6d1cafd8b56e ("sched: Resched proper CPU on yield_to()") moved the code to resched the CPU from yield_to_task_fair() to yield_to() making the preempt parameter in sched_class->yield_to_task() unnecessary. Remove it. No other sched_class implements yield_to_task(). Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200603080304.16548-3-dietmar.eggemann@arm.com
2020-06-15sched/pelt: Remove redundant cap_scale() definitionDietmar Eggemann1-2/+0
Besides in PELT cap_scale() is used in the Deadline scheduler class for scale-invariant bandwidth enforcement. Remove the cap_scale() definition in kernel/sched/pelt.c and keep the one in kernel/sched/sched.h. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20200603080304.16548-2-dietmar.eggemann@arm.com
2020-06-15sched/cputime: Improve cputime_adjust()Oleg Nesterov4-47/+56
People report that utime and stime from /proc/<pid>/stat become very wrong when the numbers are big enough, especially if you watch these counters incrementally. Specifically, the current implementation of: stime*rtime/total, results in a saw-tooth function on top of the desired line, where the teeth grow in size the larger the values become. IOW, it has a relative error. The result is that, when watching incrementally as time progresses (for large values), we'll see periods of pure stime or utime increase, irrespective of the actual ratio we're striving for. Replace scale_stime() with a math64.h helper: mul_u64_u64_div_u64() that is far more accurate. This also allows architectures to override the implementation -- for instance they can opt for the old algorithm if this new one turns out to be too expensive for them. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200519172506.GA317395@hirez.programming.kicks-ass.net
2020-06-14Linux 5.8-rc1Linus Torvalds1-2/+2
2020-06-14security: Add LSM hooks to set*gid syscallsThomas Cedeno5-1/+40
The SafeSetID LSM uses the security_task_fix_setuid hook to filter set*uid() syscalls according to its configured security policy. In preparation for adding analagous support in the LSM for set*gid() syscalls, we add the requisite hook here. Tested by putting print statements in the security_task_fix_setgid hook and seeing them get hit during kernel boot. Signed-off-by: Thomas Cedeno <thomascedeno@google.com> Signed-off-by: Micah Morton <mortonm@chromium.org>
2020-06-14Revert "btrfs: switch to iomap_dio_rw() for dio"David Sterba4-166/+169
This reverts commit a43a67a2d715540c1368b9501a22b0373b5874c0. This patch reverts the main part of switching direct io implementation to iomap infrastructure. There's a problem in invalidate page that couldn't be solved as regression in this development cycle. The problem occurs when buffered and direct io are mixed, and the ranges overlap. Although this is not recommended, filesystems implement measures or fallbacks to make it somehow work. In this case, fallback to buffered IO would be an option for btrfs (this already happens when direct io is done on compressed data), but the change would be needed in the iomap code, bringing new semantics to other filesystems. Another problem arises when again the buffered and direct ios are mixed, invalidation fails, then -EIO is set on the mapping and fsync will fail, though there's no real error. There have been discussions how to fix that, but revert seems to be the least intrusive option. Link: https://lore.kernel.org/linux-btrfs/20200528192103.xm45qoxqmkw7i5yl@fiona/ Signed-off-by: David Sterba <dsterba@suse.com>
2020-06-13net: ethernet: ti: ale: fix allmulti for nu type aleGrygorii Strashko1-9/+40
On AM65xx MCU CPSW2G NUSS and 66AK2E/L NUSS allmulti setting does not allow unregistered mcast packets to pass. This happens, because ALE VLAN entries on these SoCs do not contain port masks for reg/unreg mcast packets, but instead store indexes of ALE_VLAN_MASK_MUXx_REG registers which intended for store port masks for reg/unreg mcast packets. This path was missed by commit 9d1f6447274f ("net: ethernet: ti: ale: fix seeing unreg mcast packets with promisc and allmulti disabled"). Hence, fix it by taking into account ALE type in cpsw_ale_set_allmulti(). Fixes: 9d1f6447274f ("net: ethernet: ti: ale: fix seeing unreg mcast packets with promisc and allmulti disabled") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-13net: ethernet: ti: am65-cpsw-nuss: fix ale parameters initGrygorii Strashko1-1/+1
The ALE parameters structure is created on stack, so it has to be reset before passing to cpsw_ale_create() to avoid garbage values. Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-13net: atm: Remove the error message according to the atomic contextLiao Pingfang1-3/+1
Looking into the context (atomic!) and the error message should be dropped. Signed-off-by: Liao Pingfang <liao.pingfang@zte.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>